umx version 4.5: Extending Twin and Path-Based SEM in R with CLPM, MR-DoC, Definition Variables, $Ω$nyx Integration, and Censored Distributions
Structural Equation Modeling (SEM) is a flexible statistical technique with multiple applications, including behavioral genetics and social sciences. Building on the original design of the umx package, which improved accessibility to OpenMx by specifying a concise syntax, umx v4.5 extends functionality for longitudinal and causal twin designs while improving interoperability with graphical modelling tools such as Onyx. New capabilities include: classic and modern cross-lagged panel models; Mendelian Randomization Direction-of-Causation (MR-DoC) twin models incorporating polygenic scores as instruments; support for definition variables directly in umxRAM(); a workflow for importing paths from Ωnyx; a dedicated function for incorporating censored variables’ data into models, particularly valuable in biomarker research; improved covariate placeholder handling for definition variables; sex-limitation modelling across five twin groups, accommodating quantitative and qualitative sex differences; and covariate residualization in wide- or long-format data. These new functionalities accelerate reproducible, reliable, publication-ready twin and family modelling, and integrated journal-quality reporting, thereby lowering barriers to genetic epidemiological analyzes.
💡 Research Summary
The paper presents umx version 4.5, a major upgrade to the R package that provides a user‑friendly front‑end for OpenMx‑based structural equation modeling, with a focus on twin and family designs. The authors describe a suite of new high‑level functions that dramatically simplify the specification, estimation, and reporting of complex longitudinal and causal models.
First, umxCLPM implements both the classic cross‑lagged panel model (CLPM) and the random‑intercept CLPM (RI‑CLPM). By adding random intercepts for each variable, the RI‑CLPM separates stable between‑person variance from within‑person dynamics, allowing cross‑lagged paths to reflect genuine temporal causality rather than trait‑level confounding. The function also supports the inclusion of instrumental variables, giving researchers extra degrees of freedom for within‑wave causal inference.
Second, umxMRDoC extends Mendelian Randomization (MR) to twin data. The MR‑DoC model combines polygenic scores (PRS) as genetic instruments with the classic direction‑of‑causation (DoC) framework, exploiting cross‑twin cross‑trait correlations to identify direct pleiotropic paths and background confounding. The MR‑DoC2 variant further allows bidirectional causal effects and merges additive genetic (A) and shared environmental (C) components into a single familial resemblance factor (F), making the model identifiable even with sibling data. Both variants handle continuous and ordinal outcomes via a latent liability threshold.
Third, definition variables—row‑specific covariates that modify means, variances, or paths—can now be declared directly inside umxRAM() using umxPath(defn=“defVar”). This eliminates the need for manual algebraic specifications and integrates definition variables seamlessly with the rest of the model. To mitigate the loss of data caused by row‑wise deletion when definition variables are missing, the authors introduce xmu_update_covar(), which inserts a placeholder value (99999) for missing covariate entries, preserving as much information as possible while preventing optimizer divergence.
Fourth, the package adds full interoperability with the graphical SEM tool Ωnyx. Paths drawn in Ωnyx can be exported as OpenMx RAM code and automatically transformed into an ACE twin specification via umxTwinMaker(). This workflow bridges visual model building and advanced biometric analysis, reducing the amount of hand‑written code required for multi‑group twin models.
Fifth, the Integrated Censored‑Uncensored (ICU) method addresses biomarker data that are left‑censored at a limit of detection. The helper function xmu_make_bin_cont_pair_data() creates paired binary (below‑LOD) and continuous (above‑LOD) variables, fixes the binary threshold, and trims the expected covariance matrix to the observed pattern, enabling full‑information maximum likelihood estimation without cumbersome data preprocessing.
Additional utilities include umxSummary() and umxSummarizeTwinData(), which generate publication‑ready LaTeX or HTML reports with fit indices, parameter tables, and interpretation guidelines; umx_residualize(), which residualizes one or more outcomes on covariates in both wide and long twin data formats; expanded sex‑limitation modeling across five twin groups (allowing quantitative and qualitative sex differences); and power analysis tools (umxPower, pow.ACE.test) for ACE, CP, and simplex models.
Overall, umx 4.5 consolidates recent advances in OpenMx, Ωnyx, and modern causal inference into a coherent, reproducible workflow. By automating model specification, handling censored data, supporting definition variables, and providing ready‑made reporting, the package lowers technical barriers for genetic epidemiologists and social scientists seeking to fit sophisticated longitudinal and causal twin models.
Comments & Academic Discussion
Loading comments...
Leave a Comment