A note on the lack of symmetry in the graphical lasso
The graphical lasso (glasso) is a widely-used fast algorithm for estimating sparse inverse covariance matrices. The glasso solves an L1 penalized maximum likelihood problem and is available as an R library on CRAN. The output from the glasso, a regularized covariance matrix estimate a sparse inverse covariance matrix estimate, not only identify a graphical model but can also serve as intermediate inputs into multivariate procedures such as PCA, LDA, MANOVA, and others. The glasso indeed produces a covariance matrix estimate which solves the L1 penalized optimization problem in a dual sense; however, the method for producing the inverse covariance matrix estimator after this optimization is inexact and may produce asymmetric estimates. This problem is exacerbated when the amount of L1 regularization that is applied is small, which in turn is more likely to occur if the true underlying inverse covariance matrix is not sparse. The lack of symmetry can potentially have consequences. First, it implies that the covariance and inverse covariance estimates are not numerical inverses of one another, and second, asymmetry can possibly lead to negative or complex eigenvalues,rendering many multivariate procedures which may depend on the inverse covariance estimator unusable. We demonstrate this problem, explain its causes, and propose possible remedies.
💡 Research Summary
**
The paper investigates a subtle yet important numerical flaw in the widely‑used graphical lasso (glasso) algorithm for estimating sparse inverse covariance (precision) matrices. Glasso solves an ℓ₁‑penalized maximum‑likelihood problem by optimizing the dual formulation, which yields a regularized covariance estimate (\hat\Sigma_\lambda) that is a valid solution of the original convex problem. However, the routine that constructs the corresponding precision estimate (\hat\Omega_\lambda) after convergence is mathematically inexact. It assembles (\hat\Omega_\lambda) from intermediate quantities derived from each column‑wise lasso sub‑problem, using the inverse of the partially updated covariance matrix at that stage rather than the final (\hat\Sigma_\lambda). Consequently, the resulting (\hat\Omega_\lambda) is generally asymmetric ((\hat\Omega_\lambda \neq \hat\Omega_\lambda^{!T})) and does not satisfy (\hat\Omega_\lambda = \hat\Sigma_\lambda^{-1}).
The authors demonstrate the problem with two illustrative examples. In a low‑dimensional setting (p = 5, n = 500) and a small regularization parameter (λ = 0.0033), several off‑diagonal entries of (\hat\Omega_\lambda) are non‑zero in one triangular half while being exactly zero in the opposite half, leading to relative errors of 100 % or even undefined (division by zero). In a high‑dimensional AR(1) model (p = 500, n = 250) the discrepancy between the edge sets defined by the upper and lower triangles of (\hat\Omega_\lambda) grows dramatically as λ decreases, as quantified by (|E_1 \Delta E_2|). These examples show that asymmetry can produce two distinct graphical models from a single glasso run.
The paper explains why this occurs. The glasso algorithm iteratively updates a matrix (W) (the current estimate of (\Sigma)) column by column using lasso coefficients (\hat\beta^{(j)}). After convergence, (\hat\Sigma_\lambda) is set to the final (W). The precision estimate is then assembled from formulas (2) that involve the inverse of each intermediate (W^{(j)}) rather than the inverse of the final (W). Because each column of (\hat\Omega_\lambda) is taken from a different intermediate inverse, the matrix is not guaranteed to be symmetric, and convergence of (W) does not imply convergence of its inverse. The issue is amplified when the sample covariance (S) is rank‑deficient (common when n ≪ p) and when λ is small; Lemma 1 proves that, under rank deficiency, entries of (\hat\Omega_\lambda) can diverge as λ → 0.
The lack of symmetry has practical consequences. First, (\hat\Omega_\lambda) is not a mathematically valid estimator of the true precision matrix, so the inferred graph may be incorrect. Second, the matrix may have non‑positive or complex eigenvalues, rendering downstream multivariate techniques such as LDA, PCA, or MANOVA unstable or undefined. Third, two different graphs can be extracted from the same output, undermining reproducibility.
To remedy the problem, the authors propose three straightforward strategies:
-
Numerical inversion – Directly compute (\hat\Omega_\lambda = \hat\Sigma_\lambda^{-1}) after glasso converges. This restores symmetry and guarantees (\hat\Omega_\lambda = \hat\Sigma_\lambda^{-1}). Although it incurs an (O(p^3)) cost, this is acceptable for the problem sizes glasso currently handles (up to p≈2000). Small numerical artifacts can be removed by hard‑thresholding near‑zero entries, and the resulting matrix should be checked for positive‑definiteness.
-
Modified glasso output – Use only the upper‑right triangle of the original (\hat\Omega_\lambda) as the final estimate. Since the upper triangle contains the most recent column updates, it is symmetric by construction (after mirroring). This approach does not satisfy (\hat\Omega_\lambda = \hat\Sigma_\lambda^{-1}) but eliminates asymmetry with negligible extra computation.
-
Iterative Proportional Fitting (IPF) – Adopt the sparsity pattern derived from the upper triangle, enforce symmetry, and then run IPF to jointly estimate a covariance matrix and its inverse that respect the chosen graph. IPF guarantees (\Omega = \Sigma^{-1}) at each iteration and preserves the sparsity pattern, though the resulting pair does not solve the original ℓ₁‑penalized likelihood exactly. Its computational cost depends on the size of the largest clique in the graph (typically modest).
The authors also suggest practical diagnostics for detecting insufficient regularization: unusually high average node degree, large condition number of (\hat\Sigma_\lambda), or a pronounced asymmetry measure. They reference recent work (Won et al., 2012) on conditioning constraints as a possible guide for selecting λ.
In summary, while glasso remains a powerful tool for high‑dimensional sparse modeling, its default precision‑matrix output can be asymmetric and numerically inconsistent with the covariance estimate, especially when λ is small and the sample covariance is rank‑deficient. The paper clarifies the mathematical origin of this issue, illustrates its impact on graph recovery and downstream analyses, and offers concrete, low‑cost remedies that practitioners can adopt to ensure symmetric, positive‑definite precision estimates suitable for further multivariate procedures.
Comments & Academic Discussion
Loading comments...
Leave a Comment