Counterfactual Density Effects and the German East--West Income Gap

We propose a novel framework for conducting causal inference based on counterfactual densities. While the current paradigm of causal inference is mostly focused on estimating average treatment effects (ATEs), which restricts the analysis to the first…

Authors: Georg Keilbar, Sonja Greven

Counterfactual Density Effects and the German East--West Income Gap
Coun terfactual Densit y Effects and the German East–W est Income Gap Georg Keilbar 1,* and Sonja Grev en 1 1 Chair of Statistics, Hum b oldt-Univ ersit¨ at zu Berlin, German y * Corr esp onding author, georg.keilbar@hu-berlin.de Marc h 31, 2026 Abstract W e prop ose a no v el framework for conducting causal inference based on coun ter- factual densities. While the current paradigm of causal inference is mostly focused on estimating a verage treatmen t effects (A TEs), which restricts the analysis to the first momen t of the outcome v ariable, our density-based approach is able to detect causal effects based on general distributional characteristics. F ollo wing the Oaxaca-Blinder decomp osition approach, w e consider tw o types of counterfactual densit y effects that together explain observed discrepancies b et ween the densities of the treated and con- trol group. First, the distribution effect is the counterfactual effect of c hanging the conditional densit y of the control group to that of the treatmen t group, while k eeping the cov ariates fixed at the treatmen t group distribution. Second, the co v ariate effect represen ts the effect of a h yp othetical change in the cov ariate distribution. Both ef- fects hav e a causal interpretation under the classical unconfoundedness and o verlap assumptions. Metho dologically , our approach is based on analyzing the conditional densities as elemen ts of a Bay es Hilb ert space, whic h preserves the non-negativity 1 and integration-to-one constraints. W e sp ecify a flexible functional additive regres- sion mo del estimating the conditional densities. W e apply our method to analyze the German East–W est income gap, i.e., the observ ed differences in wages b etw een East Germans and W est Germans. While most of the existing studies fo cus on the a verage differences and neglect other distributional characteristics, our density-based approach is suited to detect all nuances of the counterfactual distributions, including differences in probabilit y masses at zero. Keyw ords: Densit y regression, decomp osition metho ds, causal inference. 1 In tro duction The most prev alent approaches in causal inference are based on the study of mean-based quan tities suc h as the a v erage treatmen t effect (A TE) and the a verage treatment effect on the treated (A TT). While it is true that A TEs are easy to in terpret and can b e estimated reliably in man y situations, they are nonetheless restricted to identifying lo cation-shifts in the data while ignoring effects that go b eyond the first moment of the distribution. Recen tly , there has b een growing in terest in more nuanced approac hes based on quantiles or other distributional c haracteristics (Chernozhuk o v and Hansen, 2005; Firp o, 2007; Chernozh uko v et al., 2013). In this paper, w e advocate for a density-focused approach for causal inference. Similar to quantiles, densities reflect the en tire distribution of the v ariable of interest. Ho wev er, densities offer some imp ortant adv antages. The computational burden is reduced, as esti- mating conditional densities inv olves a single estimate, whereas distributional and quantile regressions must b e run separately for differen t v alues of the threshold index and quantile lev el, resp ectiv ely . There is also no monotonicity issue in the estimation unlik e in quantile- based approaches. Consequently , rearrangement metho ds to a void quan tile crossing, suc h as those prop osed in Chernozhuk ov et al. (2010), are not required. Additionally , it can b e argued that densities are b etter suited to displa y and intuitiv ely understand the shap e of 2 the data compared to distribution and quantile functions, e.g., in the presence of bimo dali- ties or shifts in the probability mass. Finally , metho ds based on quan tile regression usually require contin uous distributions, while our densit y-based approach is also applicable in the case of discrete and mixed-type outcome v ariables. These problems are highly relev an t in the case of income distributions considered in our empirical application, whic h exhibit b oth bimo dalities and probability masses at zero. Our definition of coun terfactual densities and the resulting densit y effects build on the decomp osition literature, originating with Blinder (1973) and Oaxaca (1973). Sp ecifically , w e define coun terfactual densities based on the conditional densit y of group A, ev aluated as if it w ere exp osed to the cov ariate distribution of another group, B. The original Oaxaca– Blinder decomp osition w as introduced for the purp ose of explaining differences in means of t wo groups. A sp ecial case of the decomp osition fo cusing on prop ortions and categorical co v ariates had already been prop osed b y Kitaga wa (1955). F or approac hes b eyond the mean, similar decomp ositions based on coun terfactual quan tities hav e b een studied for quan tiles and other distributional characteristics (DiNardo et al., 1996; Firp o et al., 2018; Chernozh uk ov et al., 2013). W e refer to F ortin et al. (2011) for a comprehensive ov erview of decomp osition metho ds in economics. These approac hes are based on an additive Oaxaca–Blinder type decomp osition. As a crucial difference, we instead fo cus on a multiplic ative decomp osition. W e consider t w o types of coun terfactual density effects that admit a causal interpretation under the standard unconfoundedness and o verlap assumptions. First, the distribution effect captures the coun terfactual impact of changing the conditional densit y from that of the con trol group to that of the treatmen t group, while holding the cov ariate distribution fixed at the treatment group lev el. Second, the cov ariate effect represents the effect of a hypothetical c hange in the co v ariate distribution, with the conditional density held fixed at the con trol group. The estimation of counterfactual densities dep ends critically on accurate estimates of conditional densities. How ev er, man y existing approac hes ha ve imp ortant limitations. On 3 the one hand, fully nonparametric estimators suffer from the curse of dimensionalit y , making them impractical ev en in mo derately dimensional settings. Examples include the log-spline approac h of Stone (1991, 1994) and local p olynomial-based estimators (F an et al., 1996; Cat- taneo et al., 2024). On the other hand, fully parametric mo dels rely on strong distributional assumptions, which can lead to mo del missp ecification when these assumptions are violated. In this pap er, we instead rely on the Bay es Hilb ert space approac h for conditional densit y estimation of Maier et al. (2025a). The authors prop ose a flexible structured additive regres- sion mo del that ob eys the logic of densities, i.e., the estimated densities are non-negativ e and in tegrate to one. Estimation relies on the use of basis functions (e.g., splines) and is based on a P oisson appro ximation to the Bay es Hilb ert space likelihoo d. Our coun terfactual densit y methodology is motiv ated by the analysis of the East–W est income gap in Germany . This empirical application illustrates the adv an tages of density- based approaches for sev eral reasons. First, some of the estimated densities exhibit hea vy sk ewness and bimo dality , features that are easily detectable in densit y plots but ma y b e difficult to iden tify using estimated quan tile or distribution functions. Second, the income v ariable is zero-inflated, with the fraction of zeros v arying with the co v ariates, whic h p oses c hallenges for quan tile-based metho ds. This is particularly relev an t since a fo cus on the p ositiv e part of the income distribution w ould av oid this problem only at the exp ense of losing information ab out the unemplo yed. Empirically , w e find that the East–W est gap has narro wed ov er the past 30 years after reunification; how ev er, notable differences still p ersist. Our results suggest that these differences are largely driven b y the conditional distribution rather than b y differences in the comp osition of co v ariates. I.e., differences in the cov ariate distributions explain only a small part of the observ ed differences in the income distributions. Finally , w e find that these differences are muc h more pronounced when fo cusing on the male subp opulation. W e therefore conclude that the East–W est income gap in German y is, to a large exten t, a male-specific issue. Metho ds for counterfactual distributions based on quan tile and distributional regression 4 instead of density regression ha v e been discussed before. Chernozh uk ov et al. (2013) is one of the most closely related pap ers. They fo cus on distributional (i.e., for the cum ulativ e distri- bution function) and quantile effects, but consider a similar Oaxaca–Blinder decomp osition of effects. Instead of estimating the conditional densit y , their approac h relies on quan tile and distribution function regression. F or b oth approac hes, they rely on known basis functions of co v ariates. Similarly , Mac hado and Mata (2005) consider an Oaxaca–Blinder decomp osition based on estimating conditional quantiles. In particular, they estimate coun terfactual densi- ties based on linear quantile regression fits for sev eral quantile levels and a simulation-based pro cedure. I.e., they generate counterfactual outcomes b y using the estimated quan tile re- gression co efficien ts and b y drawing from the cov ariate distribution (to integrate-out the effect of the cov ariates) and by sampling quantile ranks from a uniform distribution. The resulting coun terfactual densities need to b e estimated by kernel densit y estimation. W e consider our densit y regression-fo cused approach to be complemen tary to these existing ap- proac hes. This is particularly the case in settings with mixed-type dep endent v ariables, as in our empirical application. Quantile-based metho ds are not applicable in suc h scenarios. The issue of coun terfactual density estimation has b een studied in the existing litera- ture. DiNardo et al. (1996) considers a similar decomp osition of effects and a similar plug-in estimator for counterfactual densities. Ho wev er, (i) they do not explicitly discuss causal implications of their estimated densities, (ii) they use kernel density estimates, which re- stricts the applicability in many settings and (iii) they consider differences b et ween densities instead of ratios, whic h migh t b e hard to in terpret in lo w-density regions of the supp ort. Martinez-T ab oada and Kennedy (2024) mo del coun terfactual densities using Kernel–Stein discrepancies, but rely on parametric assumptions. Meln yc huk et al. (2023) prop ose a deep learning metho d called ‘Interv en tional Normalizing Flows’ for estimating fully parametric coun terfactual densities. Similarly , Kennedy et al. (2023) approximate counterfactual densi- ties by parametric mo dels; for instance, relying on the exp onential family , Gaussian mixture mo dels, or on truncated series regression. Theoretically , they provide asymptotic results 5 for the estimator, suc h as ro ot- n consistency and semiparametric efficiency bounds. They also consider ‘density effects’, whic h they define based on distances or other measures of discrepancy . How ev er, the ab ov e-men tioned approximations suffer from several limitations, namely p otential mo del missp ecification and the risk that the estimated densities can take negativ e v alues and do not integrate to one. F urther, the specific setting of our empirical application with bimo dalities and p oin t masses cannot b e adequately addressed b y fixed parametric distributions. The fo cus of the present pap er is therefore different from that of the ab ov e literature for t wo reasons. First, we explicitly em b ed our framework for causal inference on counterfactual densities within the decomp osition metho ds literature. Second, while the abov e papers try to detect discrepancies b et ween t w o densities in the form of scalar quan tities, our metho d allows the lo calization of the regions in the supp ort of the dep endent v ariable where the discrepancies are substantial, b y relying on ratios instead of distances. Our con tributions are fourfold. First, w e prop ose a causal inference framew ork based on a m ultiplicative Oaxaca–Blinder decomp osition of coun terfactual densities. Compared with conv en tional additive decomp ositions, we th us fo cus on relative differences instead of absolute differences b etw een densities, whic h has clear adv an tages for the analysis of lo w-density regions. A m ultiplicative approach further ob eys the logic of Bay es Hilb ert spaces, a suitable functional space for densit y functions. Second, w e prop ose an estimation pro cedure that relies on a flexible additive mo del sp ecification for the conditional densities that av oids b oth the restrictiveness of parametric mo dels and the curse of dimensionalit y of fully nonparametric mo dels. F urther, the approac h do es not require con tinuous outcome v ariables but also allows for discrete and mixed-type distributions, as is the case in our empirical application. In practice, estimation can b e carried out as an approximate Poisson regression problem. Third, the additiv e sp ecification of the conditional densities allo ws us to further isolate the effect of differen t cov ariates on the co v ariate effect. Finally , on the empirical side, we use our counterfactual densit y metho dology to gain more detailed insigh ts in to the East–W est income gap in Germany that w ould otherwise ha ve b een lost when relying 6 on mean-based pro cedures. The remainder of this pap er is structured as follo ws. Section 2 introduces the mo del setup and notation and provides a definition of counterfactual densities as well as the corre- sp onding coun terfactual densit y effects. The Bay es Hilbert space approach for the estimation of conditional densities is discussed in Section 3. W e provide a short sim ulation study to analyze the finite-sample prop erties of our estimation metho d in Section 4. In Section 5 w e emplo y our counterfactual density metho dology to study the East–W est income gap in German y . Section 6 concludes. 2 Coun terfactual Densities and Causal Effects 2.1 Setup and Notation In this section, w e in tro duce our framew ork for coun terfactual densities. F or this purp ose, w e will use the following notation. Let Y ∈ R b e the dep endent v ariable, X ∈ R d the v ector of co v ariates, and let D ∈ { 0 , 1 } b e the treatmen t v ariable, which takes the v alue one for the treated and zero otherwise. F or the sak e of exp osition, w e fo cus on binary treatment v ariables. How ev er, extensions to more than tw o treatmen t groups will b e discussed later. W e consider the following p otential outcome setting (Rubin, 1974), Y = D Y 1 + (1 − D ) Y 0 , where Y 1 is the potential outcome for the treated and Y 0 the p otential outcome for the control group. Since only one potential outcome is observ ed, Y 0 is a laten t v ariable for the treatmen t group and Y 1 for the control group. Let f Y ⟨ 1 , 1 ⟩ and f Y ⟨ 0 , 0 ⟩ denote the unconditional densities of Y 1 and Y 0 , resp ectiv ely . F urther define the conditional densities for treatment and control group as f Y 1 | X ( y | x ) and f Y 0 | X ( y | x ). In man y applications, it may b e of interest to define densities for the counterfactual quan- 7 tities. F or this purp ose, f Y ⟨ 1 , 0 ⟩ denotes the (unconditional) counterfactual densit y of the treated if they had faced the cov ariate distribution of the con trol group, f Y ⟨ 1 , 0 ⟩ ( y ) := Z X 0 f Y 1 | X ( y | x ) dF X 0 ( x ) , (1) where F X 0 is the marginal cum ulativ e distribution function (cdf ) and X 0 is the support of X for the control group. Vice v ersa w e can define f Y ⟨ 0 , 1 ⟩ , the coun terfactual density of the un treated if they had faced the cov ariate distribution of the treated group. These quantities are the ob ject of study in DiNardo et al. (1996). Chernozh uko v et al. (2013) in tro duce a similar relationship for counterfactual distribution functions. Equation (1) rev eals that the coun terfactual density is completely determined b y the conditional density of the treated group and the marginal co v ariate distribution function of the con trol group. The crucial part in the estimation of counterfactual densities thus is to devise a suitable estimator of the conditional densit y . W e will discuss this in detail in Section 3. 2.2 Causal Inference and Coun terfactual Densit y Effects T o attribute a causal interpretation to the counterfactual densities in tro duced in the previous subsection, w e ha ve to imp ose the following assumptions. Assumption 1 (Unconfoundedness) . Assume that Y 0 , Y 1 ⊥ ⊥ D | X . Assumption 2 (Overlap) . 0 < P ( D = 1 | X = x ) < 1 for al l x ∈ X , wher e X denotes the supp ort of X . Both assumptions are commonly used in the causal inference literature (e.g., Rosenbaum and Rubin (1983)), and are not particular to our densit y-fo cused framework. Ho wev er, they need to b e carefully c heck ed and discussed in an y application of our framework. A central issue is to define causal quantities whic h (i) are relev an t for practitioners and (ii) are based on the counterfactual densities. One p ossibility is to consider the decomp osition 8 b y Oaxaca (1973) originally in tro duced for the mean, but studied b y DiNardo et al. (1996) in the context of densities and by Chernozh uko v et al. (2013) for distribution and quantile effects. The decomp osition for densities is giv en by f Y ⟨ 1 , 1 ⟩ − f Y ⟨ 0 , 0 ⟩ = h f Y ⟨ 1 , 1 ⟩ − f Y ⟨ 0 , 1 ⟩ i + h f Y ⟨ 0 , 1 ⟩ − f Y ⟨ 0 , 0 ⟩ i . This gives three differen t effects, (i) the effect of changing the conditional density , (ii) the effect of changing the co v ariate distribution, and (iii) a com bination of both. Recen tly , Kennedy et al. (2023) studied another quantit y , whic h they lab eled as ‘density effects’. Consider tw o densities, g 1 ( y ) and g 0 ( y ), and some discrepancy function h : R 2 → R + . Then, their density effects are scalar quantities defined as ψ h = D h { g 1 ( y ) , g 0 ( y ) } = Z h { g 1 ( y ) , g 0 ( y ) } g 0 ( y ) dy . Examples for D h include the total v ariation distance and the KL div ergence. While Kennedy et al. (2023) did not explicitly consider a decomp osition approach, their measure can easily b e incorp orated under an Oaxaca-Blinder type decomp osition. Poten tial downsides of this approac h are limited in terpretability and relev ance for practitioners, as w ell as the loss of information entailed by aggregating discrepancies b etw een densities in to a single num b er. Ev en though it might b e a suitable measure to detect the existence of p ossible discrepancies, it do es not pro vide any information on the direction of the effect. In general, fo cusing on a scalar quan tit y leads to a loss of a large prop ortion of the information. W e argue that when studying counterfactual densities, the fo cus should lie on the analysis of the heterogeneit y of p ossible effects, i.e., ho w the treatment affects certain regions of the distribution. E.g., the treatmen t can hav e an effect on the low er or upp er tail of the distribution or it can affect the en tire distribution with a shift in the mean. F or example, the introduction of a minimum w age will primarily affect the lo wer tail of the income distribution. Going in to a similar direction as the decomp osition of Oaxaca (1973) and Blinder (1973), 9 w e can consider an alternative decomp osition of effects. Instead of lo oking at differences b et ween densities, we consider ratios, f Y ⟨ 1 , 1 ⟩ ( y ) f Y ⟨ 0 , 0 ⟩ ( y ) = f Y ⟨ 1 , 1 ⟩ ( y ) f Y ⟨ 0 , 1 ⟩ ( y ) × f Y ⟨ 0 , 1 ⟩ ( y ) f Y ⟨ 0 , 0 ⟩ ( y ) . (2) The motiv ation for the use of a multiplicativ e decomp osition is tw ofold. First, for the estimation of conditional densities w e rely on the use of Ba yes Hilb ert spaces, which are suitable spaces for density functions. These are v ector spaces in which addition corresp onds to m ultiplication and subtraction corresp onds to taking ratios (for details see Section 3). Related to this p oint, the use of ratios offers clear adv an tages ov er differences for the analysis of densities. Our prop osal can detect discrepancies along the whole domain of the density , whereas differences can only find minor differences in regions with low densit y v alues, e.g., tail regions can never sho w imp ortant differences in contrast to high density regions. Similar to Chernozh uko v et al. (2013), w e can define three kinds of effects. T yp e 1. The densit y effect of changing the conditional densit y ( distribution effe ct ), DE( y ) := f Y ⟨ 1 , 1 ⟩ ( y ) f Y ⟨ 0 , 1 ⟩ ( y ) = R X 1 f Y 1 | X ( y | x ) dF X 1 ( x ) R X 1 f Y 0 | X ( y | x ) dF X 1 ( x ) . T yp e 2. The densit y effect of changing the cov ariate distribution ( c ovariate effe ct ), CE( y ) := f Y ⟨ 0 , 1 ⟩ ( y ) f Y ⟨ 0 , 0 ⟩ ( y ) = R X 1 f Y 0 | X ( y | x ) dF X 1 ( x ) R X 0 f Y 0 | X ( y | x ) dF X 0 ( x ) . T yp e 3. The density effect of c hanging b oth the conditional density and the co v ariate distribution ( total effe ct ), TE( y ) := f Y ⟨ 1 , 1 ⟩ ( y ) f Y ⟨ 0 , 0 ⟩ ( y ) = R X 1 f Y 1 | X ( y | x ) dF X 1 ( x ) R X 0 f Y 0 | X ( y | x ) dF X 0 ( x ) . Remark 1. F or the sake of exp osition, we curr ently fo cus on the sp e cial c ase of a binary 10 tr e atment variable. But similar to Chernozhukov et al. (2013), the fr amework c an e asily b e gener alize d to K tr e atment gr oups and thus K differ ent p otential outc omes. The c orr esp ond- ing c ounterfactual density, f Y ⟨ k,l ⟩ , would for instanc e b e b ase d on the c onditional distribution of gr oup l and the c ovariate distribution of gr oup k . Remark 2. While many r e al data applic ations have dep endent variables with a c ontinuous distribution, it is also worthwhile to study the c ase of discr ete or mixe d outc omes. F or instanc e, inc ome distributions typic al ly have a p oint mass at zer o inc ome. 2.3 Decomp osing the Distribution and Co v ariate Effect An adv an tage of the classical mean-based Oaxaca-Blinder approach is the p ossibility to further decompose the distribution and co v ariate effects in to individual contributions of co- v ariates. Unfortunately , this adv antage do es not directly translate to the study of other nonlinear distributional quan tities, suc h as quan tile or distributional effects. Rothe (2015) demonstrates this problem for quantile treatmen t effects, ev en in the case of a linear mo del for the conditional quantiles. This is due to p otential dep endence among differen t co v ariates. The same issue complicates any attempt to decomp ose our counterfactual densit y effects additiv ely . As for quan tile effects, the problem ev en p ersists in the case of a purely m ulti- plicativ e density regression model as we will presen t in Section 3. A p ossible but not entirely satisfactory solution for this issue is the sequential conditioning approach of Chernozh uk ov et al. (2013), whic h suffers from the issue of path-dependence: in most applications the order of co v ariates is chosen arbitrarily . F urther, Rothe (2012) argues that suc h an approach is unable to accurately reflect the impact of group differences in the marginal distribution of a single co v ariate. Instead, we prop ose to isolate the contribution of individual co v ariates b y considering marginal cov ariate effects, holding all remaining v ariables fixed at their control group distri- bution. This approach av oids path-dep endence and has a transparen t interpretation: each quan tity measures the effect on the outcome density of shifting a single cov ariate’s distri- 11 bution from the con trol to the treatmen t group, while leaving the joint distribution of the remaining co v ariates unc hanged. While the resulting quantities do not multiply to the total co v ariate effect, they nonetheless provide interpretable summaries of which cov ariates drive the o verall co v ariate effect. Recall the definition of the cov ariate effect, CE( y ) = R f Y 0 | X ( y | x ) dF X 1 ( x ) R f Y 0 | X ( y | x ) dF X 0 ( x ) . Consider the marginal effect of the j -th v ariable on the densit y of Y k , after integrating out the effect of all other v ariables using the marginal co v ariate distribution of group l , e h j,k | l ( y | x j ) = Z f Y k | X ( y | x ) dF X l, − j ( x − j ) , for k , l = 0 , 1, where F X l, − j denotes group l ’s cdf of all cov ariates except v ariable j . Related quan tities hav e b een studied in the context of nonparametric estimation of the conditional mean, see e.g., Linton and Nielsen (1995) and H¨ ardle et al. (2004). Then we can consider the follo wing quantit y to measure v ariable X j ’s contribution to the co v ariate effect, by changing its distribution (instead of that of all X ) from con trol to treatmen t group, CE j ( y ) := R f Y 0 | X ( y | x ) dF X 0 , − j ( x − j ) dF X 1 ,j ( x j ) R f Y 0 | X ( y | x ) dF X 0 ( x ) = R e h j, 0 | 0 ( y | x j ) dF X 1 ,j ( x ) R e h j, 0 | 0 ( y | x j ) dF X 0 ,j ( x ) . (3) Under the additiv e mo del sp ecification for the conditional density that w e will in tro duce in Section 3, the partial effect e h j, 0 | 0 in (3) corresp onds to the additive effect of v ariable X j . The expression is therefore indep endent of the marginal distribution of X − j in this sp ecial case. The quan tity CE j answ ers the follo wing question: how would the outcome density change if only the marginal distribution of cov ariate X j w ere shifted from that of the control group to that of the treatmen t group, while all other cov ariates remain distributed as in the control 12 group? It provides a direct answer to this question at each point y of the support of the dep enden t v ariable, rather than compressing it into a scalar summary as in the mean case. Complemen tary to (3), one can further analyze the distribution effect b y lo oking at the con tribution of a giv en cov ariate X j to the distribution effect. W e define the contribution to the distribution effect as follo ws, DE j ( y ) := R f Y 1 | X ( y | x ) dF X 1 , − j ( x − j ) dF X 0 ,j ( x j ) R f Y 0 | X ( y | x ) dF X 0 ( x ) = R e h j, 1 | 1 ( y | x j ) dF X 0 ,j ( x j ) R e h j, 0 | 0 ( y | x j ) dF X 0 ,j ( x ) . (4) The quan tity DE j compares the partial effect of cov ariate X j on the conditional densit y across the t w o groups. Sp ecifically , the numerator ev aluates the treatmen t group’s structural relationship b et ween X j and Y , while the denominator ev aluates the con trol group’s, b oth in tegrated o v er the common (con trol group) distribution of X j . Th us, DE j ( y ) measures how m uch the densit y at y would change if only the w ay X j shap es the conditional densit y w ere switc hed from the con trol to the treatmen t group sp ecification, while the distribution of X j itself is held fixed. This isolates the role of X j in driving the distribution effect, and can b e a useful to ol for iden tifying the set of v ariables that are most resp onsible for the distribution effect. 3 Coun terfactual Densit y Estimation In this section, we present our estimation pro cedure for the counterfactual densities. The critical step in the estimation is to obtain a suitable estimate of the conditional densities. Let { ( Y ki , X ki ) } n k i =1 denote a sample of n k i.i.d. copies of ( Y k , X k ), for k = 0 , 1. W e consider a plug-in estimator of the form b f Y ⟨ 1 , 0 ⟩ ( y ) := Z X 0 b f Y 1 | X ( y | x ) d b F X 0 ( x ) , 13 where b f Y 1 | X ( y | x ) is an estimator of the conditional density and b F X 0 ( x ) = 1 n 0 n 0 X i =1 1 { X 0 i ≤ x } is the empirical distribution function of X 0 , with the inequality X 0 i ≤ x in terpreted entry- wise for each x j . b f Y ⟨ 0 , 1 ⟩ ( y ) and b F X 1 ( x ) can b e defined analogously . DiNardo et al. (1996) prop ose a k ernel density estimation pro cedure for the conditional densities. How ev er, suc h fully nonparametric methods suffer from the curse of dimensionalit y and do not w ork w ell in settings with mo derate to large dimensions of co v ariates. Other approac hes suffer from restrictiv e parametric assumptions or do not provide suitable estimates for densities, i.e., the non-negativity or in tegrating-to-one constraints migh t b e violated. Instead, for the estimation of the conditional densities w e follo w the Bay es Hilb ert space approach of Maier et al. (2025a), who developed a flexible additive model framew ork for mo deling conditional densities. 3.1 Ba y es Hilb ert Spaces Before setting up the density regression mo del, we first provide a concise in tro duction to the used Ba yes Hilb ert spaces (V an Den Bo ogart et al., 2010; v an den Bo ogaart et al., 2014). F or a more detailed introduction w e refer the reader to Maier et al. (2025b) and Maier et al. (2025a). The Bayes Hilb ert sp ac e on the measurable space ( T , A ) with reference measure µ is defined by B 2 ( µ ) = B 2 ( T , A , µ ) := { f ∈ B ( µ ) | R T (log f ) 2 dµ < ∞} , where B ( µ ) is a Bayes sp ac e (i.e., a set of equiv alence classes of µ -densities that are µ -a.e. p ositiv e and unique) with reference measure µ . This is a v ector space with addition, f 1 ⊕ f 2 = B f 1 f 2 , and scalar m ultiplication, α ⊙ f 1 = B ( f 1 ) α , for f 1 , f 2 ∈ B 2 ( µ ) and α ∈ R , where = B denotes equalit y up to scale. A crucial concept in this framew ork is the c enter e d lo g-r atio (clr) transformation, clr( f ) := log ( f ) − 1 µ ( T ) R T log( f ) dµ , which maps an elemen t in B 2 ( µ ) to a function in L 2 0 ( µ ) = L 2 0 ( T , A , µ ) := { ˜ f ∈ L 2 0 ( µ ) | R T ˜ f dµ = 0 } , a closed subspace of L 2 ( µ ). 14 T ransforming the data is beneficial for practical implementation and computational reasons as it enables the use of to ols established for L 2 -spaces. The clr transform is an isometric isomorphism, and it is bijective with inv erse transformation clr − 1 ( ˜ f ) = B exp( ˜ f ). B 2 ( µ ) is a Hilb ert space, with inner pro duct ⟨ f 1 , f 2 ⟩ B 2 ( µ ) = R T clr( f 1 ) · clr( f 2 ) dµ . Although Bay es Hilb ert spaces are defined for arbitrary measurable spaces, w e will fo cus on T ⊂ R . F ollowing Maier et al. (2025a), we distinguish b etw een three cases. F or the c ontinuous case, we ha ve T = [ a, b ] and µ is the Leb esgue measure λ on T . F or the discr ete case, T = { t 1 , . . . , t D } and µ is a weigh ted sum δ of Dirac measures. Finally , w e consider the mixe d case with T = [ a, b ] ∪ { t 1 , . . . , t D } and µ = λ + δ . T o demonstrate the imp ortance of the latter case, note that in our empirical application we will lo ok at income distributions in German y , whic h ha v e mixed-type densities with additional p oint masses at zero and for incomes ab ov e a certain threshold. 3.2 Additiv e Densit y Regression in Ba y es Hilb ert Spaces W e now present the flexible additive density regression setup of Maier et al. (2025a). In the following, we suppress the group-sp ecific index of the data for the sake of exp osition. Consider an i.i.d. sample of observ ations, ( y i , x i ) ∈ T × X , X ⊆ R d , i = 1 , . . . , n . The conditional density of Y giv en X = x i , denoted by f i := f ( Y | X = x i ), is assumed to b e an element of the Ba yes Hilb ert space B 2 ( µ ). As describ ed in the previous subsection, the framew ork is flexible enough to handle contin uous, discrete, as w ell as mixed distributions and data. W e assume the follo wing additiv e structure, f i = J M j =1 h j ( x i ) , (5) where the partial effects h j are also elemen ts of the Ba yes Hilbert space, h j ( x i ) ∈ B 2 ( µ ). Each partial effect can dep end on one, several (for interactions) or no co v ariates (the in tercept) and can b e linear or nonlinear in x i . Each effect is assumed to b e represen ted by the follo wing 15 tensor product basis, h j ( x ) = d j M l =1 d τ M m =1 θ j,l,m ⊙ b j,l ( x ) ⊙ b T ,m , (6) where b j,l : R d → R , b T ,m ∈ B 2 ( µ ) are basis functions ov er the cov ariates and ov er T , resp ectiv ely , and θ j,l,m ∈ R are the corresp onding co efficien ts. F or identification, we center smo oth main effects around the intercept β 0 and interactions around corresp onding main effects, see Maier et al. (2025a). Remark 3. Under mo del sp e cific ation (5), and if the p artial effe cts c ontain no inter action terms, we c an r ewrite the de c omp osition of the c ovariate effe ct intr o duc e d in (3), CE j ( y ) = R f Y 0 | X ( y | x ) dF X 0 , − j ( x − j ) dF 1 ,j ( x j ) R f Y 0 | X ( y | x ) dF X 0 ( x ) = R h 0 ,j ( y ,x j ) dF 1 ,j ( x j ) R h 0 ,j ( y ,x j ) dF 0 ,j ( x j ) . Applying the cen tered log-ratio (clr) transformation to (5) and using the basis function represen tation in (6) yields ˜ f i = clr ( f i ) = J X j =1 d j X l =1 d T X m =1 θ j,l,m b j,l ( x i ) ˜ b T ,m =  b ( x i ) ⊗ ˜ b T  ⊤ θ , where ˜ b T ,m = clr( b T ,m ), b ( x ) = ( b 1 , 1 ( x ) , . . . , b J,d J ( x )) ⊤ ∈ R P J j =1 d j , and ˜ b T = ( ˜ b 1 , . . . , ˜ b d T ) ⊤ ∈ ( L 2 0 ( µ )) d T are v ectors of basis functions, and the corresp onding parameter v ector θ = ( θ ⊤ 1 , . . . , θ ⊤ J ) ⊤ ∈ R R with θ j = ( θ j, 1 , 1 , . . . , θ j,d j ,d T ) ⊤ and the dimension of θ is R = P J j =1 d j d T . The c hoice of the cov ariate-sp ecific basis functions b j ( x i ) dep ends on the type of the consid- ered effect. F or instance, a smo oth non-linear effect can b e mo deled via B-splines. Similarly , the c hoice of the basis functions ˜ b T dep ends on the reference measure µ . Maier et al. (2025b) describ e constructions based on transformations of B-splines and of indicators for the con- 16 tin uous and the discrete part, resp ectiv ely . In principle, θ can b e estimated via maxim um likelihoo d estimation, with lik eliho o d and log-lik eliho o d functions given by L ( θ ) = n Y i =1 exp  ˜ f i  R T exp  ˜ f i  dµ , ℓ ( θ ) = n X i =1  ˜ f i − log Z T exp( ˜ f i ) dµ  = n X i =1  b ( x i ) ⊗ ˜ b T ( y i )  ⊤ θ − log Z T exp   b ( x i ) ⊗ ˜ b T  ⊤ θ  dµ. T o increase the smoothness of the estimated densities, it is also possible to include addi- tional p enalty terms for the co efficients, and th us to consider a p enalized log-likelihoo d. The estimation can b e computationally c hallenging due to the presence of the in tegral term in the log-likelihoo d. F ollowing Maier et al. (2025a), w e instead estimate θ b y approximating the problem via additiv e P oisson regression. 3.3 Estimation via Multinomial and P oisson Regression F or reducing the computational burden, the estimation pro cedure can b e appro ximated by using a (shifted) multinomial log-lik eliho o d. In this section we fo cus on the con tinuous case for notational simplicit y , although Maier et al. (2025a) sho w that the discrete and mixed cases can also b e cov ered. F or this purpose, we need to partition the supp ort of Y in to discrete histogram bins. Let a = a 0 < a 1 < . . . < a G = b . Then U g = [ a g − 1 , a g ) for g = 1 , . . . , G − 1 and U G = [ a G − 1 , a G ] partition the interv al [ a, b ]. The v alues of the histogram are n i g = 1 { y i ∈ U g } and the corresp onding histogram widths are ∆ g = a g − a g − 1 for g = 1 , . . . , G . F urther, denote the bin center of histogram bin U g as u g . The vector ( n i 1 , . . . , n i G ) can b e view ed as a realization of a multinomial v ariable with sample size 1 and 17 with class probabilities p i g ( θ ) = ∆ g exp   b ( x i ) ⊗ ˜ b T ( u g )  ⊤ θ  P G k =1 ∆ k exp   b ( x i ) ⊗ ˜ b T ( u k )  ⊤ θ  . The m ultinomial log-lik eliho o d up to constan ts is ℓ mn ( θ ) ∝ n X i =1 G X g =1  b ( x i ) ⊗ ˜ b T ( u g )  ⊤ θ − log G X k =1 ∆ k exp   b ( x i ) ⊗ ˜ b T ( u k )  ⊤ θ  ! . Maier et al. (2025a) show that the multinomial log-lik eliho o d conv erges to the Bay es Hilb ert space log-likelihoo d as the maximal bin size approac hes zero, as well as the con vergence of the corresp onding maxim um lik eliho o d estimator and the in verse Fisher information used for inference. F urther, they show the equiv alence of the multinomial and a certain Poisson lik eliho o d. In particular, for computational reasons it is b eneficial to p o ol observ ations that share the same com bination of cov ariates, and fit the histogram counts using a Poisson mo del with an additional in tercept parameter for each unique cov ariate com bination. W e can th us rely on additive Poisson regression for estimation of the parameter vector θ . 3.4 Uncertain t y Quantification Ha ving introduced the additiv e density regression framew ork and the asso ciated estimation pro cedure, a natural next question concerns the issue of uncertain ty quantification. In par- ticular, it is of ma jor interest to empirical researchers whether the distribution and cov ariate effects are significant, i.e., differen t from one. F or this purp ose, we rely on the asymptotic results of Maier et al. (2025a). W e prop ose dra wing v alues θ ( b ) k , b = 1 , . . . , B , of the regres- sion parameters from the (1 − α ) W ald confidence regions defined in Lemma A.15 of the ab o ve pap er, for a given significance lev el α . F or eac h sim ulation iteration, we obtain the corresp onding conditional densit y based on the sim ulated parameter v alues and the fixed 18 basis functions, b f ( b ) Y k | X ( y | x ) = ( b ( x ) ⊗ ˜ b T ) ⊤ θ ( b ) k , for k = 0 , 1. F urther, we calculate the resp ec- tiv e counterfactual densities by integrating with resp ect to the cov ariate distributions of the treatmen t and con trol groups, b f ( b ) Y ⟨ k,l ⟩ ( y ) = R X l b f ( b ) Y k | X ( y | x ) d b F X l ( x ), for k , l = 0 , 1. Since the conditional densities of b oth groups are estimated indep endently , w e can follow the ab ov e pro cedure separately for eac h group. W e thus obtain estimates of the resp ective density ef- fects based on sim ulations from the asymptotic confidence regions of the model parameters. These estimates can b e plotted alongside the original estimate of the resp ectiv e density effect and th us serv e to quan tify estimation uncertain ty . 4 Sim ulation Study 4.1 Sim ulation Setup In this section, we study the finite sample p erformance of our proposed estimator for the coun terfactual densities using a simulation study . T o enable the comparison with an alter- nativ e estimator for the conditional densities, we restrict our attention on a data-generating pro cess with categorical cov ariates and con tin uous dep endent v ariables. W e consider a set- ting with d = 3 co v ariates, each taking tw o p ossible v alues. W e assume the additive mo del sp ecification for the conditional densities in tro duced in (5). The partial effects h j are gener- ated as b eta density functions with different parameter v alues, h j ( x ) = x j ⊙ β j , j = 1 , 2 , 3. As a consequence, the conditional densities are also b eta densities. W e consider sample sizes ranging from n = 500 to n = 100 , 000. F urther details on the data-generating pro cess and other asp ects of the sim ulation study are pro vided in Section B of the supplemen tary material. It should b e noted that our mo del is not correctly sp ecified, as the spline basis functions only approximate the true conditional densities. The p erformance is ev aluated by the total v ariation (TV) distance b etw een the true and estimated counterfactual densities. W e further ev aluate the estimation accuracy of the conditional densities using the same metric. T o pro vide a b enchmark, we compare the 19 estimation accuracy with an alternative approac h based on k ernel density estimation with a Gaussian k ernel and Silverman’s rule of th umb bandwidth, carried out separately for ev ery cov ariate combination, which is p ossible in this case with only binary cov ariates. All sim ulation results are based on 1 , 000 Mon te Carlo iterations. 4.2 Sim ulation Results T able 1 rep orts the estimation accuracy of the four counterfactual densities in terms of the TV distance b etw een estimated and true densities. As exp ected, the estimation b ecomes more accurate with increasing sample size. The table further rep orts the estimation accuracy of the b enc hmark metho d based on k ernel densit y estimates of the conditional densities. The p erformance of b oth metho ds is quite similar in small samples, whereas the Ba y es Hilb ert space approac h has a slight adv an tage in settings with larger samples ( n = 10 , 000 , 20 , 000). Ho wev er, for the largest sample size ( n = 100 , 000), the p erformance of the tw o metho ds app ears to conv erge. Since this is the k ey step of our estimation metho dology , we separately analyze the estimation accuracy of the conditional densities using the TV distance b et w een the true and estimated densities, comparing the Bay es–Hilb ert approac h with kernel densit y estimation. T o make the estimation results easier to in terpret, we aggregate the results by av eraging the TV distances ov er all co v ariate com binations. Interestingly , the results in T able 2 tell a differen t story from the estimation results of the unconditional counterfactual densities. In fact, the estimation accuracy of the Ba yes Hilbert space approach is higher in all the settings w e consider. The reason for this is that the method mak es explicit use of the m ultiplicative structure of the conditional densities. T o illustrate this p oint, for the estimation of one particular conditional density , f Y j | X ( y | X = x ), the Ba yes Hilb ert space approach can borrow strength from data that do not b elong to this conditional densit y . I.e., an observ ation i with X i  = x can still impact the estimation of the conditional densit y . In con trast, the k ernel density estimator can only mak e use of observ ations for which X i = x . Ho w ever, since 20 Ba yes Hilb ert Kernel densit y n f Y ⟨ 1 , 1 ⟩ f Y ⟨ 1 , 0 ⟩ f Y ⟨ 0 , 1 ⟩ f Y ⟨ 0 , 0 ⟩ f Y ⟨ 1 , 1 ⟩ f Y ⟨ 1 , 0 ⟩ f Y ⟨ 0 , 1 ⟩ f Y ⟨ 0 , 0 ⟩ 500 0.036 0.042 0.049 0.048 0.036 0.040 0.052 0.048 1,000 0.028 0.032 0.038 0.037 0.028 0.032 0.039 0.036 5,000 0.015 0.017 0.021 0.019 0.016 0.019 0.021 0.021 10,000 0.011 0.013 0.015 0.014 0.013 0.016 0.016 0.017 20,000 0.009 0.010 0.012 0.011 0.010 0.013 0.012 0.013 100,000 0.006 0.006 0.008 0.008 0.006 0.008 0.007 0.008 T able 1: T otal v ariation distance b etw een estimated and true counterfactual densities. The results on the left side of the table are based on estimated conditional densities using the Ba yes Hilb ert space approach; the results on the right side are based on kernel density estimation. the estimation of the coun terfactual densities in v olves taking a v erages ov er the estimated conditional densities, the kernel density estimator has the adv antage that the estimates of the differen t conditional densities are statistically indep endent. Therefore, the results for the coun terfactual densities in T able 1 are less clear than the results in T able 2. W e w an t to p oin t out that for comparison purp oses the simulation study is restricted to discrete regressors and a contin uous outcome v ariable. An additional adv antage of the Ba yes Hilb ert space approac h is that it can b e easily applied to settings with con tinuous regressors as well as discrete and mixed-t yp e outcome v ariables. T o accoun t for con tinuous regressors, the k ernel densit y b enc hmark w ould also need to rely on smo othing in the cov ari- ate dimension, which would mak e the estimation problem of the conditional densities m uch more difficult. In contrast, the Bay es Hilb ert space approach can b e readily applied in these settings as well. 21 Ba yes Hilb ert Kernel densit y n f Y 1 | X f Y 0 | X f Y 1 | X f Y 0 | X 500 0.064 0.073 0.086 0.096 1,000 0.047 0.056 0.066 0.074 5,000 0.025 0.032 0.037 0.041 10,000 0.019 0.024 0.029 0.032 20,000 0.015 0.018 0.022 0.025 100,000 0.008 0.011 0.013 0.014 T able 2: Average total v ariation distance b etw een estimated and true conditional densities (a veraged o ver all conditional densities). The results on the left side of the table are based on estimated conditional densities using the Ba yes Hilb ert space approach; the results on the righ t side are based on k ernel densit y estimation. 5 Decomp osing the East–W est Income Gap in Ger- man y 5.1 Data and Bac kground W e apply our counterfactual densit y framew ork to analyze the East–W est gap in gross in- comes in German y . Most existing studies on this sub ject consider a lo cation-based definition of Easterners and W esterners, irresp ective of the place of birth and so cialization (Burda et al., 1997; Kluge and W eb er, 2018). How ev er, this approach is lik ely to suffer from endo- geneit y bias. Place of residence and place of w ork are to a large exten t c hoice v ariables that migh t dep end on latent factors, thereb y distorting the effects of interest. In con trast to these ab o ve studies, Dick ey and Widmaier (2021) are interested b oth in a lo cation-based and an origin-based East–W est gap. These existing pap ers use Oaxaca–Blinder type decomp osition metho ds, whic h are based either on mean regression, quantile regression, or the unconditional quan tile regression approach of F ortin et al. (2011). W e use our decomp osition approach for densities, which allo ws to lo ok at effects on the whole distribution in an interpretable wa y while also b eing able to co v er zero as w ell as p ositive incomes. Due to the aforemen tioned endogeneit y issues, we restrict our analysis to an origin-based definition of the East–W est gap. The data are obtained from the So cio-Economic Panel 22 (SOEP), which pro vides p erson-sp ecific information on demographic and so cio-economic asp ects (see Go eb el et al. (2019)). W e identify Easterners using the v ariable ‘lo c1989’, which pro vides information on the place of residence immediately b efore the fall of the Berlin W all and German reunification. If this information is not a v ailable, we further classify a p erson as an Easterner if their birth region lies geographically in the East. This secondary iden tification is essential for categorizing individuals b orn after 1989. F ormally , this is done using the v ariable ‘birthregion ew’. W e restrict our analysis to individuals aged 18–67. W e further exclude retirees, studen ts (b oth at school and at universit y), as w ell as p ersons with disabilities. T o accoun t for ov er- and under-representation of certain demographic groups, w e use the corresponding cross-sectional individual w eigh ts pro vided in the dataset. The dep enden t v ariable is monthly gross lab or income, which includes b oth primary and secondary income sources. T o ensure comparabilit y across time, all income v alues are adjusted to 2021 price levels. W e note that the density of this v ariable is of a mixed type, with a p oin t mass at zero. Additionally , w e use an upper bound for the support of the income v ariable at EUR 10 , 000, setting v alues ab o v e to the category 10 , 000+ represen ted b y a second p oin t mass. This in tro duces another p oint mass at the threshold v alue. As cov ariates, we consider sex ( x sex ); a categorical v ariable for lev el of education ( x edu ); an indicator v ariable for whether the place of residence is urban or rural ( x rur al ); and a categorical v ariable for emplo yer size ( x siz e ). Apart from these categorical explanatory v ariables, w e also consider the age of the individual as a smo oth effect. By the logic of the Oaxaca–Blinder approach, w e run t wo separate regressions for Easterners and W esterners, with the following additiv e densit y regression mo del f Y k | X ( y | x ) = β 0 ,k ( y ) ⊕ β edu,k ( y , x edu ) ⊕ β rur al ,k ( y , x rur al ) ⊕ β siz e,k ( y , x siz e ) ⊕ β sex,k ( y , x sex ) ⊕ g k ( y , x ag e ) , where k = 1 for the East and k = 0 for the W est. The partial effects β j,k denote group- 23 sp ecific intercepts for the j -th categorical v ariables, where β j,k = 0 B 2 , the additiv e neutral elemen t of the Ba y es Hilb ert space for the resp ectiv e reference category , and g k ( · ) repre- sen ts a smo oth effect for age. Since East Germans are our treatment group and W est Germans the control group, the distribution and cov ariate effects are defined as DE( y ) = f Y ⟨ East , East ⟩ ( y ) /f Y ⟨ W est , East ⟩ ( y ) and CE( y ) = f Y ⟨ W est , East ⟩ ( y ) /f Y ⟨ W est , W est ⟩ ( y ), resp ectively . T o ac- coun t for changes ov er time, w e conduct separate analyses for the y ears 1991 ( n 0 = 3 , 132, n 1 = 6 , 764), 2001 ( n 0 = 3 , 936, n 1 = 9 , 929), 2011 ( n 0 = 4 , 893, n 1 = 12 , 781) and 2021 ( n 0 = 2 , 447, n 1 = 6 , 433). F or the y ears 1991 and 2001, we choose smaller upp er bounds for the con tinuous part of the income distribution: EUR 5 , 000 and EUR 7 , 000, resp ectiv ely . 5.2 Estimation Results This section presen ts the estimation results for the counterfactual densities, the corresp ond- ing distribution and co v ariate effects, and their developmen t ov er time. The results for the East–W est gap for 2001 and 2021 are visualized in Figures 1 and 2. W e refer to Figures 5 and 6 in Section A of the supplemen tary material for additional results for 1991 and 2011. W e add 100 draws from the 95% confidence regions for the distribution and co v ariate effects to quantify the estimation uncertain t y , follo wing the pro cedure outlined in Subsection 3.4. First, w e observ e that the discrepancies b etw een income densities for East and W est ha ve decreased substan tially ov er recen t decades. This confirms both previous empirical findings and theoretical predictions ab out East–W est wage con v ergence. Still, the distribution effect for 2021 shows that differences p ersist in the lo wer as w ell as upper regions of the income distribution. F or instance, ev en if W est Germans were to hav e the same age structure and demographic characteristics as East Germans, they w ould b e far more likely to b e in the upp er tail of the income distribution. As a second ma jor empirical result, we find that the densit y-based East–W est gap can b e o verwhelmingly attributed to the distribution effect, with the co v ariate effect only playing a minor role. Thus, the observed differences can not b e w ell explained b y factors suc h as education or urbanization. 24 0 1000 2000 3000 4000 5000 6000 7000 0.00000 0.00015 0.00030 Distribution Effect f_east_east f_west_east 0.0 0.1 0.2 0.3 0.4 0 1000 2000 3000 4000 5000 6000 7000 0.00000 0.00015 0.00030 Cov ariate Effect f_west_east f_west_west 0.0 0.1 0.2 0.3 0.4 0 1000 2000 3000 4000 5000 6000 7000 0.0 1.0 2.0 3.0 0 1000 2000 3000 4000 5000 6000 7000 0.0 1.0 2.0 3.0 Figure 1: Decomp osition of the total densit y effect in to distribution effect (left panels) and co v ariate effect (righ t panels) for the y ear 2001. The top panels show the estimated coun terfactual densities, f Y ⟨ East , East ⟩ , f Y ⟨ W est , East ⟩ , and f Y ⟨ W est , W est ⟩ . The lo wer panels show the estimated density effects DE( y ) and CE( y ), with 100 draws from the 95% confidence region. T o illustrate the b enefit of our density-focused analysis, w e include a comparison with a classical (additiv e) Oaxaca–Blinder decomp osition for the mean incomes of the treatmen t (East) and control (W est) groups. When lo oking at the left side of T able 3, it b ecomes clear that a mean-based analysis provides a simple, scalar summary that is straigh tforward to in terpret. How ev er, this in terpretability comes at the exp ense of not b eing able to capture the n uances of the differences. F or example, the higher av erage income of W esterners can stem either from a large share of high incomes or from a very low share of lo w incomes. Empirically , the results reaffirm our finding that the co v ariate effect is dominated by the distribution effect, and that the latter effect is decreasing o ver time. In terestingly , the co v ariate effect is p ositive in the years follo wing the reunification, but the sign switches b et ween the years 2001 and 2011. W e also compare our results with the ‘density effects’ prop osed by Kennedy et al. (2023), whic h are again a scalar measure for the discrepancy b etw een tw o (counterfactual) densi- 25 0 2000 4000 6000 8000 10000 0.00000 0.00015 0.00030 Distribution Effect f_east_east f_west_east 0.0 0.1 0.2 0.3 0.4 0 2000 4000 6000 8000 10000 0.00000 0.00015 0.00030 Cov ariate Effect f_west_east f_west_west 0.0 0.1 0.2 0.3 0.4 0 2000 4000 6000 8000 10000 0.0 1.0 2.0 3.0 0 2000 4000 6000 8000 10000 0.0 1.0 2.0 3.0 Figure 2: Decomp osition of the total densit y effect in to distribution effect (left panels) and co v ariate effect (righ t panels) for the y ear 2021. The top panels show the estimated coun terfactual densities, f Y ⟨ East , East ⟩ , f Y ⟨ W est , East ⟩ , and f Y ⟨ W est , W est ⟩ . The lo wer panels show the estimated density effects DE( y ) and CE( y ), with 100 draws from the 95% confidence region. ties. As a metric w e choose the total v ariation distance, and we also decomp ose the effect additiv ely into a distribution and cov ariate effect. F or simplicity , w e use the same estimated coun terfactual densities as in our main analysis. The results on the righ t hand side of T able 3 sho w the fundamental limitation of this approac h. Even if the approac h can reliably detect discrepancies b et w een the estimated densities b eyond the case of simple mean shifts, it is still incapable of iden tifying the regions of the income distributions that are resp onsible for the discrepancy . An additional disadv an tage is the lack of information ab out the direction of the effect across the distribution. This information is of course of utmost imp ortance for p olicymak ers. W e therefore argue that scalar measures for discrepancies b etw een densities can b e b eneficial as an additional to ol for analysis, but imp ortan t information ab out the direction and lo cation of the effect is lost in the pro cess. In the previous analysis, w e assumed that the sex v ariable en ters additively in the t wo income regressions. Ho w ever, the decomp osition of the East–W est income gap may be fun- 26 DE CE DE CE 1991 –1426 349 0.558 0.072 2001 –703 102 0.225 0.025 2011 –481 –71 0.149 0.041 2021 –213 –145 0.097 0.034 T able 3: Results for the Oaxaca–Blinder t yp e decomp osition, i.e., the distribution effect (DE) and the co v ariate effect (CE), for mean differences (left side of the table) and analysis of the total v ariation distance b etw een coun terfactual densities (right side of the table) following Kennedy et al. (2023). damen tally differen t for men and women. W e therefore presen t additional estimation results for coun terfactual densities and density effects based on separate estimations for men and w omen for 2001. Figures 3 and 4 indeed sho w that the East–W est gap is a more pronounced issue for the male p opulation. This can partly b e explained b y the larger share of part-time w ork for women in the W est compared to the East. Indeed, one can see that the distribution effect is below one in the lo wer regions of the income distribution. I.e., relativ ely more women ha ve a lo w income in W est German y than in East German y . In contrast, w e see similar but less strong effects for w omen in the upp er tail regions of the income distribution. In the following, w e wan t to further analyze the cov ariate effect by isolating the impact of single v ariables using equation (3). W e refer to Figure 7 in the supplementa ry material for the contributions to the cov ariate effect in 2001 of the v ariables education, CE edu ( y ), and age, CE age ( y ). F or education, the contribution to the cov ariate effect is significantly b elow one in the lo w er income regions, whic h implies that the share of East Germans in these regions is higher despite and not b ecause of differences in education. F or age, w e do not observ e any significan t effects in an y direction. Finally , in Section A.2 of the supplement, w e include a robustness chec k by including an additional categorical v ariable controlling for the industrial sector of the main job. The inclusion leads to an endogeneit y issue, since having an industry co de presupp oses the em- plo yment of the p erson, which is why w e do not include it in our main empirical analysis. Due to this issue, we need to restrict the analysis to the contin uous part of the income 27 0 1000 2000 3000 4000 5000 6000 7000 0e+00 2e−04 4e−04 Distribution Effect f_east_east f_west_east 0.0 0.1 0.2 0.3 0.4 0 1000 2000 3000 4000 5000 6000 7000 0e+00 2e−04 4e−04 Cov ariate Effect f_west_east f_west_west 0.0 0.1 0.2 0.3 0.4 0 1000 2000 3000 4000 5000 6000 7000 0 1 2 3 4 5 6 7 0 1000 2000 3000 4000 5000 6000 7000 0 1 2 3 4 5 6 7 Figure 3: Decomp osition of the total density effect in to distribution effect (left panels) and co v ariate effect (righ t panels) for men in the year 2001. The top panels sho w the estimated coun terfactual densities, f Y ⟨ East , East ⟩ , f Y ⟨ W est , East ⟩ , and f Y ⟨ W est , W est ⟩ . The lo wer panels show the estimated density effects DE( y ) and CE( y ), with 100 draws from the 95% confidence region. distribution. W e show that the results are indeed robust tow ards con trolling for industry , i.e., the inclusion of the v ariable cannot explain the remaining discrepancies in the income distributions betw een East and W est. 6 Conclusion In this pap er, w e presented a new framework for conducting causal inference based on coun- terfactual densities. The approac h is based on a m ultiplicative Oaxaca–Blinder type de- comp osition of the densities of the treatment and con trol groups into a distribution and co v ariate effect. T o estimate the conditional densities, w e rely on the Ba yes Hilb ert space additiv e density regression mo del of Maier et al. (2025a). As an application of our approach, w e analyze the German East–W est income gap. W e find that differences b et ween income densities decreased ov er time and that the decline can mainly be attributed to c hanges in 28 0 1000 2000 3000 4000 5000 6000 7000 0e+00 2e−04 4e−04 Distribution Effect f_east_east f_west_east 0.0 0.1 0.2 0.3 0.4 0 1000 2000 3000 4000 5000 6000 7000 0e+00 2e−04 4e−04 Cov ariate Effect f_west_east f_west_west 0.0 0.1 0.2 0.3 0.4 0 1000 2000 3000 4000 5000 6000 7000 0 1 2 3 4 5 6 7 0 1000 2000 3000 4000 5000 6000 7000 0 1 2 3 4 5 6 7 Figure 4: Decomp osition of the total density effect in to distribution effect (left panels) and co v ariate effect (right panels) for women in the y ear 2001. The top panels sho w the estimated coun terfactual densities, f Y ⟨ East , East ⟩ , f Y ⟨ W est , East ⟩ , and f Y ⟨ W est , W est ⟩ . The lo wer panels show the estimated density effects DE( y ) and CE( y ), with 100 draws from the 95% confidence region. the conditional distribution. In contrast, differences in the co v ariate distribution betw een East and W est only play a minor role. Additionally , we find that the East–W est gap is muc h more pronounced for the male sub-p opulation. A ma jor adv an tage of our densit y-based approac h is that it is able to capture high degrees of heterogeneit y in the causal effects. F urther, visualization of the coun terfactual densities and corresp onding effects allows for an intuitiv e in terpretation of the mo deled effects. Com- pared to alternative approac hes based on quantiles, our approach allows for mixed types of the dep enden t v ariable. There are some limitations to our approach, whic h can b e the sub ject of future researc h. First, the computational complexity of the Poisson estimation problem can b e substan tial in the case of large sample sizes and contin uous explanatory v ariables. Second, it is curren tly assumed that the underlying additive densit y regression mo del is correctly sp ecified. F uture research could analyze the effect of missp ecification on the estimation results. 29 Conflicts of In terest The authors declare no conflict of interests. Ac kno wledgmen ts Georg Keilbar and Sonja Greven gratefully ac knowledge the financial supp ort b y the Deutsc he F orsch ungsgemeinsc haft (DF G, German Researc h F oundation) - pro ject n um b er 513634041. References Blinder, A. S. (1973). W age discrimination: reduced form and structural estimates. Journal of Human r esour c es , pages 436–455. Burda, M. C., Sc hmidt, C. M., et al. (1997). Getting b ehind the East-West wage differ ential: The ory and evidenc e . Citeseer. Cattaneo, M. D., Chandak, R., Jansson, M., and Ma, X. (2024). Boundary adaptive lo cal p olynomial conditional density estimators. Bernoul li , 30(4):3193–3223. Chernozh uko v, V., F ern´ andez-V al, I., and Galichon, A. (2010). Quan tile and probabilit y curv es without crossing. Ec onometric a , 78(3):1093–1125. Chernozh uko v, V., F ern´ andez-V al, I., and Melly , B. (2013). Inference on counterfactual distributions. Ec onometric a , 81(6):2205–2268. Chernozh uko v, V. and Hansen, C. (2005). An IV mo del of quan tile treatmen t effects. Ec ono- metric a , 73(1):245–261. Dic key , H. and Widmaier, A. M. (2021). The p ersisten t pay gap b et ween easterners and w esterners in german y: A quarter-century after reunification. Pap ers in R e gional Scienc e , 100(3):605–631. DiNardo, J., F ortin, N. M., and Lemieux, T. (1996). Lab or market institutions and the distribution of w ages, 1973-1992: A semiparametric approach. Ec onometric a , 64(5):1001– 1044. F an, J., Y ao, Q., and T ong, H. (1996). Estimation of conditional densities and sensitivit y measures in nonlinear dynamical systems. Biometrika , 83(1):189–206. 30 Firp o, S. (2007). Efficien t semiparametric estimation of quan tile treatment effects. Ec ono- metric a , 75(1):259–276. Firp o, S. P ., F ortin, N. M., and Lemieux, T. (2018). Decomp osing w age distributions using recen tered influence function regressions. Ec onometrics , 6(2):28. F ortin, N., Lemieux, T., and Firp o, S. (2011). Decomp osition metho ds in economics. In Handb o ok of L ab or Ec onomics , v olume 4, pages 1–102. Elsevier. Go eb el, J., Grabk a, M. M., Liebig, S., Kroh, M., Rich ter, D., Schr¨ oder, C., and Sch upp, J. (2019). The german so cio-economic panel (so ep). Jahrb¨ ucher f ¨ ur National¨ okonomie und Statistik , 239(2):345–360. H¨ ardle, W., M ¨ uller, M., Sp erlich, S., and W erwatz, A. (2004). Nonp ar ametric and semip ar a- metric mo dels . Springer. Kennedy , E., Balakrishnan, S., and W asserman, L. (2023). Semiparametric coun terfactual densit y estimation. Biometrika , 110(4):875–896. Kitaga wa, E. M. (1955). Comp onents of a difference b etw een t w o rates. Journal of the A meric an Statistic al Asso ciation , 50(272):1168–1194. Kluge, J. and W eb er, M. (2018). Decomp osing the German east–west wage gap. Ec onomics of T r ansition , 26(1):91–125. Lin ton, O. and Nielsen, J. P . (1995). A kernel metho d of estimating structured nonparametric regression based on marginal in tegration. Biometrika , pages 93–100. Mac hado, J. A. and Mata, J. (2005). Counterfactual decomp osition of changes in w age distributions using quantile regression. Journal of Applie d Ec onometrics , 20(4):445–465. Maier, E.-M., F ottner, A., Greven, S., and St¨ oc ker, A. (2025a). Additive densit y regression. arXiv pr eprint arXiv:2510.14502 . Maier, E.-M., St¨ ock er, A., Fitzen b erger, B., and Grev en, S. (2025b). Additive density-on- scalar regression in bay es hilb ert spaces with an application to gender economics. The A nnals of Applie d Statistics , 19(1):680–700. Martinez-T ab oada, D. and Kennedy , E. (2024). Counterfactual density estimation using k ernel stein discrepancies. In The Twelfth International Confer enc e on L e arning R epr e- sentations . 31 Meln ych uk, V., F rauen, D., and F euerriegel, S. (2023). Normalizing flows for interv entional densit y estimation. In International Confer enc e on Machine L e arning , pages 24361–24397. PMLR. Oaxaca, R. (1973). Male-female wage different ials in urban lab or markets. International Ec onomic R eview , pages 693–709. Rosen baum, P . R. and Rubin, D. B. (1983). The central role of the prop ensit y score in observ ational studies for causal effects. Biometrika , 70(1):41–55. Rothe, C. (2012). P artial distributional p olicy effects. Ec onometric a , 80(5):2269–2301. Rothe, C. (2015). Decomp osing the composition effect: the role of co v ariates in determin- ing b et ween-group differences in economic outcomes. Journal of Business & Ec onomic Statistics , 33(3):323–337. Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandom- ized studies. Journal of Educ ational Psycholo gy , 66(5):688. Stone, C. J. (1991). Asymptotics for doubly flexible logspline resp onse mo dels. The A nnals of Statistics , 19(4):1832–1854. Stone, C. J. (1994). The use of p olynomial splines and their tensor pro ducts in multiv ariate function estimation. The Annals of Statistics , pages 118–171. v an den Bo ogaart, K. G., Egozcue, J. J., and Pa wlo wsky-Glahn, V. (2014). Ba y es Hilb ert spaces. Austr alian & New Ze aland Journal of Statistics , 56(2):171–194. V an Den Bo ogart, K.-G., Egozcue, J. J., and P awlo wsky-Glahn, V. (2010). Ba y es linear spaces. SOR T-Statistics and Op er ations R ese ar ch T r ansactions , 34(2):201–222. 32 Supplemen tary Material A Additional Estimation Results A.1 Additional Figures This section con tains additional Figures of the estimated coun terfactual densities and densit y effects. 0 1000 2000 3000 4000 5000 0e+00 4e−04 8e−04 Distribution Effect f_east_east f_west_east 0.0 0.1 0.2 0.3 0.4 0 1000 2000 3000 4000 5000 0e+00 4e−04 8e−04 Covariate Eff ect f_west_east f_west_west 0.0 0.1 0.2 0.3 0.4 0 1000 2000 3000 4000 5000 0 2 4 6 8 0 1000 2000 3000 4000 5000 0 2 4 6 8 Figure 5: Decomp osition of the total densit y effect in to distribution effect (left panels) and co v ariate effect (righ t panels) for the y ear 1991. The top panels show the estimated coun terfactual densities, f Y ⟨ East , East ⟩ , f Y ⟨ W est , East ⟩ , and f Y ⟨ W est , W est ⟩ . The lo wer panels show the estimated density effects DE( y ) and CE( y ) with 100 draws from the 95% confidence region. 33 0 2000 4000 6000 8000 10000 0.00000 0.00015 0.00030 Distribution Effect f_east_east f_west_east 0.0 0.1 0.2 0.3 0.4 0 2000 4000 6000 8000 10000 0.00000 0.00015 0.00030 Covariate Eff ect f_west_east f_west_west 0.0 0.1 0.2 0.3 0.4 0 2000 4000 6000 8000 10000 0.0 1.0 2.0 3.0 0 2000 4000 6000 8000 10000 0.0 1.0 2.0 3.0 Figure 6: Decomp osition of the total densit y effect in to distribution effect (left panels) and co v ariate effect (righ t panels) for the y ear 2011. The top panels show the estimated coun terfactual densities, f Y ⟨ East , East ⟩ , f Y ⟨ W est , East ⟩ , and f Y ⟨ W est , W est ⟩ . The lo wer panels show the estimated density effects DE( y ) and CE( y ) with 100 draws from the 95% confidence region. 0 1000 2000 3000 4000 5000 6000 7000 0e+00 2e−04 4e−04 Education covariate eff ect f_west_east f_west_west 0.0 0.1 0.2 0.3 0.4 0 1000 2000 3000 4000 5000 6000 7000 0e+00 2e−04 4e−04 Age covariate effect f_west_east f_west_west 0.0 0.1 0.2 0.3 0.4 0 1000 2000 3000 4000 5000 6000 7000 0.6 0.8 1.0 1.2 1.4 0 1000 2000 3000 4000 5000 6000 7000 0.6 0.8 1.0 1.2 1.4 Figure 7: Contribution to the Co v ariate effect of v ariables education (left panel) and age (righ t panel) in the year 2001. The top panels show the estimated counterfactual densities and the low er panels sho w the estimated co v ariate effects CE edu ( y ) and CE ag e ( y ), with 100 dra ws from the 95% confidence region. 34 A.2 Robustness Chec k: Impact of Industry In this subsection, we pro vide a robustness chec k for our results by including an additional industry v ariable. This v ariable in tro duces an endogeneit y problem, since mem b ership in a certain industry presupp oses that the p erson is employ ed and has a p ositive income. Nonetheless, information on industry ma y help explain the observ ed differences in income densities b etw een East and W est. F or this purp ose, we restrict our analysis to the p ositive part of income, discarding all individuals with zero income. F or the industry categorization, w e use the first digit of the v ariables ‘pgkldb92’ for 1991, and ‘pgkldb10’ for 2021. See Figures 8 and 9 for the estimation results for the y ears 1991 and 2021, resp ectively . W e observ e that the observed discrepancy in (p ositive) income densities is again dominated by the distribution effect. The inclusion of the industry v ariable did not fundamen tally c hange the results. In particular, when lo oking at the v ariable’s con tribution to the co v ariate effect, w e conclude that the role is rather minor. 0 1000 2000 3000 4000 5000 0e+00 4e−04 8e−04 Distribution Effect f_east_east f_west_east 0 1000 2000 3000 4000 5000 0e+00 4e−04 8e−04 Covariate Eff ect f_west_east f_west_west 0 1000 2000 3000 4000 5000 0e+00 4e−04 8e−04 Industry Covariate Effect f_west_east f_west_west 0 1000 2000 3000 4000 5000 0 2 4 6 0 1000 2000 3000 4000 5000 0 2 4 6 0 1000 2000 3000 4000 5000 0 1 2 3 4 5 6 7 Figure 8: Distribution effect, cov ariate effect and industry’s con tribution to the cov ariate effect for the estimation of the contin uous part of the income densities in 1991. 35 0 2000 4000 6000 8000 10000 0e+00 1e−04 2e−04 3e−04 4e−04 Distribution Effect f_east_east f_west_east 0 2000 4000 6000 8000 10000 0e+00 1e−04 2e−04 3e−04 4e−04 Covariate Eff ect f_west_east f_west_west 0 2000 4000 6000 8000 10000 0e+00 1e−04 2e−04 3e−04 4e−04 Industry Covariate Effect f_west_east f_west_west 0 2000 4000 6000 8000 10000 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0 2000 4000 6000 8000 10000 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0 2000 4000 6000 8000 10000 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Figure 9: Distribution effect, cov ariate effect and industry’s con tribution to the cov ariate effect for the estimation of the contin uous part of the income densities in 2021. B Additional Sim ulation Details This section pro vides additional details ab out the simulation settings and estimation metho ds used in Section 4 in the main text. F or b oth the treatmen t and con trol group, w e simulate n observ ations for the co v ariates from a m ultinomial distribution. F or the former, w e consider uniform class probabilities. F or the latter, we set the class probabilities to (0 . 25 , 0 . 2 , 0 . 14 , 0 . 125 , 0 . 095 , 0 . 008 , 0 . 06 , 0 . 05). I.e., we hav e 8 com binations of co v ariates, with the first category ha ving binary co v ariates x 1 = x 2 = x 3 = 1, the second category ha ving co v ariates x 1 = x 2 = 1, x 3 = 2, etc. F or each simulated cov ariate observ ation, w e sim ulate the corresp onding dep endent v ariable from the conditional density , which we c ho ose to b e a b eta distribution with parameters α 1 = β 1 = (1 , 5 , 5 , 9 , 2 , 6 , 6 , 10) for the treatment group, and α 0 = (1 , 10 , 2 , 11 , 2 , 11 , 3 , 12) and β 0 = (1 , 2 , 10 , 11 , 2 , 3 , 11 , 12) for the control group. The true counterfactual densities are therefore mixtures of b eta distributions. F or the estimation of the conditional densities using the Ba y es Hilb ert space approach, w e first discretize the supp ort of the dep endent v ariable in to 50 equally sized histogram bins. 36 The n umber of spline basis function is fixed at 12. The estimation is carried out without imp osing additional p enalty terms on the estimated co efficients. The b enchmark approach in volving kernel densit y estimation uses a Gaussian kernel and the bandwidth is selected according to Silverman’s rule of th umb. 37

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment