MIS-Boost: Multiple Instance Selection Boosting
In this paper, we present a new multiple instance learning (MIL) method, called MIS-Boost, which learns discriminative instance prototypes by explicit instance selection in a boosting framework. Unlike previous instance selection based MIL methods, we do not restrict the prototypes to a discrete set of training instances but allow them to take arbitrary values in the instance feature space. We also do not restrict the total number of prototypes and the number of selected-instances per bag; these quantities are completely data-driven. We show that MIS-Boost outperforms state-of-the-art MIL methods on a number of benchmark datasets. We also apply MIS-Boost to large-scale image classification, where we show that the automatically selected prototypes map to visually meaningful image regions.
💡 Research Summary
The paper introduces MIS‑Boost (Multiple Instance Selection Boost), a novel multiple‑instance learning (MIL) algorithm that learns discriminative instance prototypes without restricting them to the finite set of training instances. Traditional MIL approaches either infer instance labels directly from bag labels or select a limited set of prototypes from the training data, often fixing the number of prototypes a priori. MIS‑Boost removes both constraints by allowing prototypes to reside anywhere in the continuous feature space ℝⁿ and by determining the required number of prototypes automatically through a boosting framework.
Formally, given bags B_i = {x_{i1}, …, x_{in_i}} with bag‑level labels y_i ∈ {−1, +1}, the final classifier is an additive model F(B) = sign(∑{m=1}^M f_m(B)). Each base learner f_m is associated with a prototype p_m and is defined as a scaled sigmoid of the distance between p_m and the bag: f_m(B) = 2/(1+exp(−(β₁·D(p_m,B)+β₀)))−1, where D(p,B) = min_j ||p−x{ij}||. To make D differentiable, the authors replace the hard min with a soft‑min approximation ˜D(p,B) = Σ_j π_j ||p−x_{ij}||, where π_j = exp(−α||p−x_{ij}||) / Σ_k exp(−α||p−x_{ik}||). With a sufficiently large α, ˜D closely approximates the true min while remaining smooth for gradient‑based optimization.
Training proceeds within Gentle‑AdaBoost. At each boosting iteration, a weighted least‑squares problem is solved to find the prototype p_m and sigmoid parameters (β₀, β₁) that minimize the weighted error ε_m = Σ_i w_i (y_i − f_m(B_i))². Because the objective is non‑convex, the authors employ a coordinate‑descent scheme: (1) initialize p_m to a cluster centroid obtained by k‑means clustering of all instances (K=100 in experiments); (2) fix p_m and solve for (β₀, β₁) analytically; (3) fix (β₀, β₁) and update p_m via gradient descent on the smooth objective; repeat until convergence. Multiple initializations (one per cluster) are tried, and the prototype yielding the lowest error is retained.
The number of base learners M is not fixed. The algorithm runs for a large maximum M (e.g., 100) while maintaining a validation set. After each iteration, the validation error is recorded; the iteration with the smallest validation error determines the final number of prototypes M*. This cross‑validation step mitigates over‑fitting and yields a data‑driven model complexity.
Empirical evaluation covers five classic MIL benchmarks (Musk1, Musk2, Elephant, Fox, Tiger) and two COREL image classification datasets. Using 10‑fold cross‑validation on the benchmarks, MIS‑Boost consistently outperforms or matches state‑of‑the‑art methods such as MI‑Boost, mi‑Graph, MIForest, and MILES. On the large‑scale image classification task, the learned prototypes correspond to visually meaningful image patches (e.g., object parts), demonstrating that the algorithm captures semantically relevant structures rather than arbitrary feature directions.
Key contributions include: (1) freeing prototype selection from the discrete training set, enabling richer representations; (2) integrating prototype learning directly into a boosting objective, so that prototype selection is guided by classification loss; (3) employing a soft‑min approximation to retain differentiability while preserving the intuitive “closest‑instance” semantics; (4) automatically determining the number of prototypes via validation‑based early stopping.
Limitations are acknowledged. The soft‑min parameter α influences the approximation quality and may require tuning. The coordinate‑descent optimization is susceptible to local minima, especially in high‑dimensional spaces. The reliance on k‑means for initialization introduces sensitivity to the choice of K. Future work could explore adaptive α strategies, more robust global optimization techniques (e.g., variational inference or meta‑heuristics), and alternative initialization schemes.
In summary, MIS‑Boost presents a principled, flexible, and empirically strong solution to MIL by learning continuous prototypes within a boosting framework, thereby advancing both the theoretical understanding and practical performance of multiple‑instance learning.
Comments & Academic Discussion
Loading comments...
Leave a Comment