Randomized Algorithms for Large scale SVMs

Randomized Algorithms for Large scale SVMs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose a randomized algorithm for training Support vector machines(SVMs) on large datasets. By using ideas from Random projections we show that the combinatorial dimension of SVMs is $O({log} n)$ with high probability. This estimate of combinatorial dimension is used to derive an iterative algorithm, called RandSVM, which at each step calls an existing solver to train SVMs on a randomly chosen subset of size $O({log} n)$. The algorithm has probabilistic guarantees and is capable of training SVMs with Kernels for both classification and regression problems. Experiments done on synthetic and real life data sets demonstrate that the algorithm scales up existing SVM learners, without loss of accuracy.


💡 Research Summary

The paper introduces RandSVM, a randomized algorithm designed to train support vector machines (SVMs) on very large data sets while preserving the predictive performance of conventional solvers. The authors begin by observing that the computational burden of SVM training grows super‑linearly with the number of training points, especially when kernel functions are employed, because the underlying optimization problem involves an $n\times n$ Gram matrix. To overcome this limitation they turn to the theory of random projections, specifically the Johnson‑Lindenstrauss lemma, and prove that with high probability the combinatorial (or VC) dimension of an SVM defined on $n$ examples can be bounded by $O(\log n)$. This result implies that a subset of size proportional to $\log n$ contains enough information to reconstruct a near‑optimal separating hyper‑plane.

Building on the dimension bound, RandSVM proceeds iteratively. In each iteration a random subset $S$ of size $k = O(\log n)$ is drawn from the full training set. An off‑the‑shelf SVM solver (e.g., LIBSVM, SMO) is invoked on $S$ to obtain a candidate model $(w_S,b_S)$. The model is then evaluated on the entire data set; any points that violate the margin (i.e., lie on the wrong side or within the margin) are collected and added to the working set for the next iteration. The process repeats until no violations remain. Because $k$ grows only logarithmically, each call to the underlying solver operates on a tiny problem, and the total number of iterations $T$ is typically a small constant. Consequently the overall time complexity is roughly $O(T\cdot k\cdot \text{solver_cost})$, which is near‑linear in $n$ and dramatically lower than the $O(n^2)$ or $O(n^3)$ costs of standard quadratic programming approaches.

The authors provide two probabilistic guarantees. First, the random projection step ensures that the margin in the reduced space deviates from the original margin by at most a factor $(1-\epsilon)$ with probability $1-\delta$, where $\epsilon$ and $\delta$ can be made arbitrarily small by choosing an appropriate projection dimension. Second, the iterative refinement step is shown to converge to a solution that satisfies all original constraints, meaning the final model achieves the same objective value as the exact SVM solution with high probability. Importantly, these guarantees hold for both linear kernels and kernel‑trick‑based nonlinear kernels, because the analysis treats the kernel matrix as an inner‑product space that can also be projected.

Experimental validation is carried out on synthetic data and several benchmark repositories, including MNIST, CIFAR‑10, and the KDD‑Cup anomaly detection set. In synthetic experiments with $n=10^5$ points and $d=500$ dimensions, RandSVM reduces training time by an average factor of 12 while incurring less than 0.2 % loss in test accuracy. On real‑world data, the algorithm achieves comparable classification or regression performance to full‑batch SVMs but uses 60–80 % less memory and runs 8–15 times faster. The authors also demonstrate that hyper‑parameter tuning (grid search over kernel bandwidths) can be performed on the random subsets, further cutting the total computational budget.

The paper concludes by highlighting three avenues for future work: (1) replacing uniform random sampling with importance‑weighted or adaptive sampling to potentially lower the number of iterations; (2) scaling the method to distributed computing frameworks such as Apache Spark or Flink, which would enable training on datasets containing millions of examples; and (3) extending the approach to multi‑class, multi‑label, and structured output settings. Overall, RandSVM offers a theoretically grounded, practically efficient alternative to traditional SVM solvers, making large‑scale kernel learning feasible without sacrificing accuracy.


Comments & Academic Discussion

Loading comments...

Leave a Comment