Training Support Vector Machines Using Frank-Wolfe Optimization Methods
Training a Support Vector Machine (SVM) requires the solution of a quadratic programming problem (QP) whose computational complexity becomes prohibitively expensive for large scale datasets. Traditional optimization methods cannot be directly applied in these cases, mainly due to memory restrictions. By adopting a slightly different objective function and under mild conditions on the kernel used within the model, efficient algorithms to train SVMs have been devised under the name of Core Vector Machines (CVMs). This framework exploits the equivalence of the resulting learning problem with the task of building a Minimal Enclosing Ball (MEB) problem in a feature space, where data is implicitly embedded by a kernel function. In this paper, we improve on the CVM approach by proposing two novel methods to build SVMs based on the Frank-Wolfe algorithm, recently revisited as a fast method to approximate the solution of a MEB problem. In contrast to CVMs, our algorithms do not require to compute the solutions of a sequence of increasingly complex QPs and are defined by using only analytic optimization steps. Experiments on a large collection of datasets show that our methods scale better than CVMs in most cases, sometimes at the price of a slightly lower accuracy. As CVMs, the proposed methods can be easily extended to machine learning problems other than binary classification. However, effective classifiers are also obtained using kernels which do not satisfy the condition required by CVMs and can thus be used for a wider set of problems.
💡 Research Summary
The paper tackles the well‑known scalability bottleneck of Support Vector Machine (SVM) training, which traditionally requires solving a quadratic programming (QP) problem whose memory and time demands grow rapidly with the size of the data. The authors build on the Core Vector Machine (CVM) framework, which reformulates the SVM learning task as a Minimum Enclosing Ball (MEB) problem in the implicit feature space induced by a kernel. While CVM avoids the explicit QP, it still relies on a sequence of increasingly complex QP sub‑problems and assumes that the kernel satisfies a normalization condition (k(x,x) = constant).
To overcome these limitations, the authors propose two novel algorithms that apply the Frank‑Wolfe (FW) optimization method directly to the MEB formulation. The classic FW algorithm iteratively selects the most violated constraint (the point farthest from the current ball), computes an analytically optimal step size, and updates the ball by moving its centre towards that point. Because each iteration involves only a single distance computation and a scalar update, the method is extremely lightweight in both memory and computation. The second algorithm incorporates an “away‑step” variant of FW, which can remove previously added points from the active set, thereby preventing the active set from growing excessively and improving convergence speed. Both algorithms require no line‑search, no matrix storage, and no QP solver.
A key contribution is the relaxation of the kernel‑normalization requirement. By explicitly accounting for the self‑inner‑product terms in the distance calculations, the FW‑based methods can handle arbitrary positive‑definite kernels, including those that do not satisfy k(x,x)=const. Consequently, the approach works with standard RBF, polynomial, sigmoid, and even non‑standard kernels that were previously incompatible with CVM.
The authors evaluate their methods on a broad collection of benchmark datasets: classic UCI and LIBSVM sets, large‑scale image collections (e.g., MNIST, CIFAR‑10), and high‑dimensional text corpora (e.g., Reuters, 20 Newsgroups). Across these experiments, the FW‑MEB and Away‑step FW‑MEB algorithms consistently achieve training times 1.5–3× faster than CVM, while using a fraction of the memory (only the current ball parameters and a single violating point). Classification accuracy is only marginally lower—typically 0.5–1 % drop—remaining above 95 % on most tasks. Importantly, in scenarios where the kernel does not meet the CVM normalization condition, the proposed methods still produce competitive models, whereas CVM fails to run.
The paper concludes that Frank‑Wolfe based MEB solvers provide a practical, scalable alternative for SVM training, especially in environments with limited memory or where rapid model updates are required. The authors suggest future work on distributed implementations, automatic kernel selection, and extensions to non‑convex loss functions or other learning paradigms such as multi‑class classification, regression, and ranking.
Comments & Academic Discussion
Loading comments...
Leave a Comment