Exact Instance Compression for Convex Empirical Risk Minimization via Color Refinement
Empirical risk minimization (ERM) can be computationally expensive, with standard solvers scaling poorly even in the convex setting. We propose a novel lossless compression framework for convex ERM based on color refinement, extending prior work from linear programs and convex quadratic programs to a broad class of differentiable convex optimization problems. We develop concrete algorithms for a range of models, including linear and polynomial regression, binary and multiclass logistic regression, regression with elastic-net regularization, and kernel methods such as kernel ridge regression and kernel logistic regression. Numerical experiments on representative datasets demonstrate the effectiveness of the proposed approach.
💡 Research Summary
The paper introduces a novel, loss‑less compression framework for convex empirical risk minimization (ERM) problems based on the classic graph‑theoretic technique of color refinement. While prior work has shown that color refinement can be used to exactly reduce linear programs (LPs) and convex quadratic programs (QPs) by aggregating variables and constraints that belong to the same equitable partition, this research extends the idea to the much broader class of differentiable convex programs that underlie most modern machine learning models.
The authors first formalize the notion of a reduction coloring (Definition 3.1). A pair of partitions—one on the constraints (P) and one on the variables (Q)—constitutes a reduction coloring when, for any two variables in the same color class, the gradients of the objective and of each aggregated constraint are identical at any point where those variables share the same value; likewise, all constraints in a color class must share the same right‑hand side and bound values, and all variables in a color class must share identical box bounds. Under these conditions, a reduced problem is defined (Definition 3.2) by substituting a single representative variable for each Q‑class and a single representative constraint for each P‑class, while scaling the remaining data with the appropriate stochastic partition matrices.
The central theoretical contribution is Theorem 3.3, which proves that any optimal solution of the original convex program can be mapped to an optimal solution of the reduced program via the scaled partition matrix Π_scaled^Q, and conversely, any optimal solution of the reduced program lifts to an optimal solution of the original problem via Π^Q. The proof relies on convexity, differentiability, and Lagrangian optimality conditions, and is fully detailed in the appendix.
A key insight is that reduction colorings are strictly more powerful than symmetry‑based reductions. The authors define the automorphism group Γ of a convex program (permutations of variables and constraints that leave the objective, bounds, and constraints invariant) and show in Theorem 3.4 that the coarsest reduction coloring is always at least as coarse as the orbit partition induced by Γ. Consequently, even when a problem exhibits no non‑trivial permutation symmetry, substantial compression may still be achievable if the gradients and constraint values align across groups of variables or samples.
To compute the coarsest equitable partition efficiently, the paper adapts the classic color‑refinement algorithm to weighted bipartite matrices (Algorithm 1). Starting from an initial coloring of rows (constraints) and columns (variables), the algorithm iteratively refines colors based on weighted sums of matrix entries until convergence. The authors prove a runtime of O(m n (log m + log n)) for dense matrices, and O(nnz(A) (log m + log n)) for sparse data, where m and n are the numbers of constraints and variables, respectively.
The general framework is instantiated for several widely used ERM models:
- Least‑squares linear and polynomial regression – identical rows or columns of the design matrix lead to aggregation of samples or features.
- Binary and multiclass logistic regression – samples sharing the same label and identical feature vectors can be merged; the gradient of the logistic loss depends only on the inner product, enabling further reductions.
- Elastic‑net regularized regression – variables with identical (ℓ₁, ℓ₂) regularization coefficients are grouped.
- Kernel ridge regression and kernel logistic regression – the kernel matrix is treated as a weighted bipartite graph; rows/columns with identical kernel similarity profiles are aggregated.
For each model, the paper derives concrete sufficient conditions that can be checked directly from the data, labels, and sample weights, and then applies the color‑refinement algorithm to obtain the reduced problem automatically.
Empirical evaluation focuses on binary logistic regression across ten benchmark datasets from OpenML and LIBSVM. The experiments demonstrate compression ratios ranging from 30 % to 70 % of the original number of variables, with no measurable loss in objective value (differences < 10⁻⁸) or classification accuracy. Moreover, the time required by standard solvers (e.g., L‑BFGS‑B) on the reduced problems is reduced by a factor of 2–5 on average, confirming the practical benefits of the approach.
In conclusion, the paper provides a rigorous, general-purpose method for exact instance compression of convex ERM problems. By leveraging color refinement, it transcends the limitations of permutation‑symmetry methods and offers deterministic, loss‑less dimensionality reduction that is applicable to a wide spectrum of machine learning models, including those involving kernels and complex regularizers. The authors suggest future work on extending the theory to non‑convex losses, online/streaming settings, and integration with distributed optimization frameworks.
Comments & Academic Discussion
Loading comments...
Leave a Comment