Fast Training of Effective Multi-class Boosting Using Coordinate Descent Optimization
Wepresentanovelcolumngenerationbasedboostingmethod for multi-class classification. Our multi-class boosting is formulated in a single optimization problem as in Shen and Hao (2011). Different from most existing multi-class boosting methods, which use the same set of weak learners for all the classes, we train class specified weak learners (i.e., each class has a different set of weak learners). We show that using separate weak learner sets for each class leads to fast convergence, without introducing additional computational overhead in the training procedure. To further make the training more efficient and scalable, we also propose a fast co- ordinate descent method for solving the optimization problem at each boosting iteration. The proposed coordinate descent method is conceptually simple and easy to implement in that it is a closed-form solution for each coordinate update. Experimental results on a variety of datasets show that, compared to a range of existing multi-class boosting meth- ods, the proposed method has much faster convergence rate and better generalization performance in most cases. We also empirically show that the proposed fast coordinate descent algorithm needs less training time than the MultiBoost algorithm in Shen and Hao (2011).
💡 Research Summary
The paper introduces a novel multi‑class boosting framework that departs from the conventional practice of sharing a single pool of weak learners across all classes. Instead, it trains a distinct set of weak learners for each class, generating K new weak learners (one per class) at every boosting iteration. This class‑specific weak learner strategy leads to a denser weight vector and dramatically accelerates convergence without incurring extra computational cost, because the sub‑problem for generating the most violated constraints remains of the same complexity as in the original MultiBoost formulation.
Formally, the authors adopt a large‑margin formulation with exponential loss:
min_{w≥0} ‖w‖₁ + C ∑{i=1}^m ∑{y≠y_i} exp(−γ(i,y)),
where γ(i,y)=w_{y_i}ᵀh_{y_i}(x_i)−w_yᵀh_y(x_i) is the class‑margin. The primal problem is solved via column generation: a master problem (restricted to the currently generated weak learners) and a sub‑problem that, for each class c, finds the weak learner that maximizes the violation of the dual constraints. The sub‑problem’s computational cost is identical to that of the original MultiBoost, ensuring that the new approach does not increase per‑iteration overhead.
The main scalability bottleneck lies in the master problem, whose dimensionality equals K × n (number of classes times number of weak learners). Prior work relied on generic solvers such as MOSEK or L‑BFGS, which become impractical for large‑scale data. To address this, the authors propose a Fast Coordinate Descent (FCD) algorithm tailored to the multi‑class boosting objective. By fixing all but one coordinate w_j, the sub‑problem reduces to a one‑dimensional convex function:
g(w_j)=|w_j| + C
Comments & Academic Discussion
Loading comments...
Leave a Comment