Identification and Estimation of Network Models with Nonparametric Unobserved Heterogeneity
Homophily based on observables is widespread in networks. Therefore, homophily based on unobservables (fixed effects) is also likely to be an important determinant of the interaction outcomes. Failing to properly account for latent homophily (and other complex forms of unobserved heterogeneity) can result in inconsistent estimators and misleading policy implications. To address this concern, we consider a network model with nonparametric unobserved heterogeneity, leaving the role of the fixed effects unspecified. We argue that the interaction outcomes can be used to identify agents with the same values of the fixed effects. The variation in the observed characteristics of such agents allows us to identify the effects of the covariates, while controlling for the fixed effects. Building on these ideas, we construct several estimators of the parameters of interest and characterize their large sample properties. Numerical experiments illustrate the usefulness of the suggested approaches and support the asymptotic theory.
💡 Research Summary
The paper tackles a fundamental problem in dyadic network analysis: how to consistently estimate the effect of observed covariates when unobserved heterogeneity (fixed effects) may be highly non‑linear, multi‑dimensional, and interact in unknown ways. The author proposes a very general model
Yᵢⱼ = F(Wᵢⱼ′β₀ + g(ξᵢ, ξⱼ)) + εᵢⱼ,
where Wᵢⱼ are pair‑specific observables, β₀ is the parameter of interest, ξᵢ and ξⱼ are latent agent‑specific fixed effects of unknown dimension, g(·,·) is an unrestricted smooth coupling function, F is a known monotone link (e.g., logistic), and εᵢⱼ is an idiosyncratic error. This formulation nests many existing specifications (additive fixed effects, linear factor models, stochastic block models, etc.) while allowing for latent homophily, non‑additive interactions, and arbitrary functional forms.
Identification strategy
The key insight is that interaction outcomes can be used to construct a “pseudo‑distance” dᵢⱼ that equals zero if and only if ξᵢ = ξⱼ. Under homoskedastic errors, simple pairwise‑difference regressions of Y on W recover dᵢⱼ because the error term cancels out. Agents with identical latent effects are thus identified as those with vanishing pseudo‑distance. Once such groups are found, the variation in their observed characteristics provides the variation needed to identify β₀ while controlling for the unobserved effects.
Extension to heteroskedasticity
When εᵢⱼ is heteroskedastic, the cancellation trick fails. The author therefore first recovers the “error‑free” component Yᵢⱼ = Wᵢⱼ′β₀ + g(ξᵢ, ξⱼ) using a modified graphon‑estimation technique inspired by Zhang et al. (2017). By adapting matrix completion and spectral methods, the estimator achieves uniform consistency in the max‑norm, meaning that every entry Yᵢⱼ is estimated accurately simultaneously—a stronger result than the usual Frobenius‑norm rates found in the graphon literature. This uniform recovery allows the model to be treated as if εᵢⱼ were absent, effectively reducing the problem to the homoskedastic case.
Estimation of β₀ and g
With Yᵢⱼ in hand, the author proceeds in two steps. First, estimate the pseudo‑distances ˆdᵢⱼ from Y using the same pairwise‑difference logic. Second, for each pair of agents (i, j) with small ˆdᵢⱼ (i.e., similar latent effects), run a regression of (Yᵢₖ – Yⱼₖ) on (Wᵢₖ – Wⱼₖ) across all third parties k. The slope of this regression consistently estimates β₀. The paper derives the bias introduced by imperfect matching (when ˆdᵢⱼ ≠ 0) and shows that under mild regularity conditions the bias is of second order, so the estimator remains √n‑consistent. Moreover, because the entire matrix of pair‑specific fixed effects g(ξᵢ, ξⱼ) can be recovered uniformly, the method also yields consistent estimates of pair‑specific effects, enabling the calculation of average partial effects and other policy‑relevant quantities.
Asymptotic theory
The analysis is conducted under a “single‑network” asymptotic framework where the number of nodes n grows while the network remains a single realization. The first‑stage graphon estimator converges at rate Oₚ(n⁻¹ᐟ²) in the max‑norm; the second‑stage β̂ converges at the same √n rate provided the matching set is sufficiently rich (i.e., enough variation in W among agents with similar ξ). The paper also establishes uniform consistency for the estimated coupling function ĝᵢⱼ.
Monte‑Carlo evidence
Simulations explore a variety of data‑generating processes: homoskedastic vs. heteroskedastic errors, linear vs. highly non‑linear g, different link functions F, and both low‑rank and full‑rank W matrices. Across all settings, the proposed estimator exhibits dramatically lower bias and root‑mean‑square error than traditional additive‑fixed‑effects or low‑rank factor approaches, especially when latent homophily is present.
Relation to existing literature
The work extends the additive fixed‑effects literature (Graham 2017, Dzemski 2018) by allowing an unrestricted g. It differs from recent semiparametric extensions (Gao et al. 2023) that still impose scalar ξ and monotonicity, and from clustering‑based methods (Bonhomme et al. 2022) that cannot separate ξ from X in dyadic settings. The uniform graphon recovery also contributes to the statistics literature, where most results focus on average‑error metrics.
Implications
By jointly identifying latent similarity and observable effects within a single network, the methodology enables researchers to disentangle true covariate impacts from spurious correlations driven by unobserved homophily. This is crucial for policy analysis in trade networks, education‑teacher matching, hospital‑patient referrals, and any setting where agents self‑select into links based on unobserved traits. Moreover, the ability to estimate pair‑specific fixed effects opens the door to targeted interventions that depend on the exact dyad composition.
In sum, the paper delivers a comprehensive identification argument, a novel two‑stage estimator that works under both homoskedastic and heteroskedastic errors, rigorous asymptotic guarantees, and compelling simulation evidence, thereby substantially advancing the econometrics of networks with non‑parametric unobserved heterogeneity.
Comments & Academic Discussion
Loading comments...
Leave a Comment