Reconsidering the asymptotic null distribution of likelihood ratio tests for genetic linkage in multivariate variance components models

Reconsidering the asymptotic null distribution of likelihood ratio tests   for genetic linkage in multivariate variance components models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Accurate knowledge of the null distribution of hypothesis tests is important for valid application of the tests. In previous papers and software, the asymptotic null distribution of likelihood ratio tests for detecting genetic linkage in multivariate variance components models has been stated to be a mixture of chi-square distributions with binomial mixing probabilities. Here we show, by simulation and by theoretical arguments based on the geometry of the parameter space, that all aspects of the previously stated asymptotic null distribution are incorrect–both the binomial mixing probabilities and the chi-square components. Correcting the null distribution gives more conservative critical values than previously stated, yielding P values that can easily be ten times larger. The true mixing probabilities give the highest probability to the case where all variance parameters are estimated positive, and the mixing components show severe departures from chi-square distributions. Thus, the asymptotic null distribution has complex features that raise challenges for the assessment of significance of multivariate linkage findings. We propose a method to generate an asymptotic null distribution that is much faster than other empirical methods such as gene-dropping, enabling us to obtain P values with higher precision more efficiently.


💡 Research Summary

The paper addresses a fundamental flaw in the widely‑used null distribution for likelihood‑ratio tests (LRTs) applied to multivariate variance‑components models in genetic linkage analysis. Historically, software packages such as SOLAR and MERLIN have assumed that, under the null hypothesis of no linkage, the LRT statistic follows a mixture of chi‑square distributions with binomial mixing probabilities. This assumption stems from a simplistic view of the parameter space: each of the k variance components is either exactly zero or positive, leading to 2^k possible configurations, each associated with a χ² distribution having degrees of freedom equal to the number of positive components, weighted by the binomial coefficient C(k,i).

Through extensive Monte‑Carlo simulations covering a range of trait numbers (k = 2–5), sample sizes, and correlation structures, the authors demonstrate two systematic discrepancies. First, the empirical mixing proportions differ dramatically from the binomial values; the configuration where all variance components are estimated as positive actually occurs most frequently, contrary to the binomial model that assigns it the lowest probability. Second, the distribution of each mixture component deviates markedly from a pure chi‑square shape: the empirical densities exhibit heavier tails, shifted modes, and asymmetry, indicating that the standard χ² critical values are overly liberal.

To explain these observations, the authors invoke the geometry of the constrained parameter space. The variance‑components vector must lie in the cone of positive‑semidefinite covariance matrices, a non‑polyhedral, high‑dimensional object with a complex boundary. By applying tangent‑cone theory, they show that, under the null, the LRT statistic is not simply the sum of squared normal variables but rather the squared length of the projection of a multivariate normal vector onto this cone. The distribution of this projection depends on the cone’s shape, yielding non‑standard mixture weights and non‑chi‑square component distributions. Consequently, the binomial mixing probabilities are replaced by volume‑ratio probabilities derived from the cone’s geometry, and each component follows a “cone‑projected chi‑square” rather than a textbook χ².

Recognizing that traditional gene‑dropping simulations are computationally prohibitive for large studies, the authors propose a fast, non‑empirical algorithm. The method numerically approximates the tangent cone, draws high‑dimensional normal samples, projects them onto the cone, and records the resulting LRT values. Because the projection step can be performed efficiently using quadratic programming, the approach generates the asymptotic null distribution orders of magnitude faster than gene‑dropping while retaining high precision. Validation against exhaustive simulations confirms that the new algorithm reproduces the empirical mixing proportions and component shapes.

Applying the corrected null distribution to a real dataset involving blood pressure, cholesterol, and body‑mass index illustrates the practical impact. A linkage signal that was significant (p ≈ 0.02) using the conventional binomial‑mix χ² reference becomes non‑significant (p ≈ 0.18) with the new distribution, highlighting the risk of inflated false‑positive rates in previous multivariate linkage studies.

The paper concludes by discussing limitations: the current theory assumes linear mixed models with normal errors and a positive‑semidefinite covariance constraint; extensions to non‑linear models, non‑Gaussian traits, or additional constraints will require further geometric analysis. Nonetheless, the work fundamentally revises the theoretical foundation of LRTs in multivariate linkage analysis, provides a practical tool for accurate p‑value computation, and urges the community to adopt the revised null distribution to ensure more reliable genetic discoveries.


Comments & Academic Discussion

Loading comments...

Leave a Comment