Linear Latent Variable Models: The lava-package

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

An R package for specifying and estimating linear latent variable models is presented. The philosophy of the implementation is to separate the model specification from the actual data, which leads to a dynamic and easy way of modeling complex hierarchical structures. Several advanced features are implemented including robust standard errors for clustered correlated data, multigroup analyses, non-linear parameter constraints, inference with incomplete data, maximum likelihood estimation with censored and binary observations, and instrumental variable estimators. In addition an extensive simulation interface covering a broad range of non-linear generalized structural equation models is described. The model and software are demonstrated in data of measurements of the serotonin transporter in the human brain.

💡 Research Summary

The paper introduces lava, an R package designed for specifying and estimating linear latent variable models (LVMs) with a focus on flexibility, modularity, and advanced statistical capabilities. The core philosophy of lava is to decouple model specification from the data itself. Users first construct a “model object” where variables, structural paths, measurement equations, and parameter constraints are declared. This object can then be linked to any compatible data set for estimation, allowing the same model to be reused across multiple studies or to be incrementally expanded without rewriting code.

From a methodological standpoint, lava implements a comprehensive suite of features that go far beyond basic SEM functionality. It provides robust (sandwich) standard errors that are valid for clustered or correlated observations, addressing the common issue of intra‑cluster dependence in longitudinal or multilevel designs. Multi‑group analysis is fully supported; researchers can freely constrain or free parameters across groups, enabling tests of measurement invariance, group‑specific effects, and cross‑group comparisons.

Non‑linear parameter constraints are expressed as user‑defined functions. When these constraints are differentiable, lava automatically computes gradients via automatic differentiation, integrating them seamlessly into the likelihood optimization routine. This capability overcomes the restrictive linear constraints typical of many SEM packages.

Missing data are handled through Full‑Information Maximum Likelihood (FIML) and multiple imputation, ensuring unbiased estimates under the Missing at Random (MAR) assumption. Moreover, lava extends maximum‑likelihood estimation to censored and binary outcomes by incorporating Tobit models and logistic/probit link functions, making it suitable for neuroimaging, epidemiology, and other fields where detection limits or dichotomous measurements are common.

Endogeneity is addressed via instrumental variable (IV) estimators. Users can specify latent variables or observed exogenous variables as instruments, and lava offers both two‑stage least squares (2SLS) and Generalized Method of Moments (GMM) implementations. This allows researchers to obtain consistent estimates when latent constructs are correlated with error terms.

A distinctive aspect of lava is its built‑in simulation interface. Because the same model object used for estimation can be employed to generate synthetic data, users can define complex data‑generating processes that include non‑linear structural equations, non‑Gaussian error distributions, and intricate censoring mechanisms. Repeated simulations enable thorough assessment of estimator bias, efficiency, and coverage properties under realistic scenarios.

The authors demonstrate the package with an empirical application to serotonin transporter (SERT) binding measured by PET scans in the human brain. Multiple regional binding values are modeled as indicators of underlying latent factors, while covariates such as age, sex, and genotype define a multi‑group structure. Lava’s robust standard errors and multi‑group facilities reveal age‑related declines in SERT binding and subtle sex differences, and the censored‑data handling appropriately deals with binding values that exceed the scanner’s dynamic range. The example showcases how lava can simultaneously accommodate hierarchical modeling, measurement error, missingness, censoring, and group comparisons within a single coherent framework.

The paper also provides a practical walkthrough of the package’s workflow: creating the model object, specifying constraints, attaching data, running the estimator, extracting summaries, and visualizing results. Extensibility is emphasized; users can add custom link functions, alternative optimizers, or plug‑in new statistical routines without altering the core architecture.

In summary, lava represents a significant advancement for researchers working with linear latent variable models. By separating model definition from data, offering robust inference for clustered and censored data, supporting multi‑group and non‑linear constraints, handling missingness, and providing instrumental‑variable estimators, it equips analysts with a versatile, reproducible, and powerful toolset for modern structural equation modeling across a wide range of scientific disciplines.

Linear Latent Variable Models: The lava-package

💡 Research Summary

Comments & Academic Discussion

Leave a Comment