Multi-Instance Learning with Any Hypothesis Class
In the supervised learning setting termed Multiple-Instance Learning (MIL), the examples are bags of instances, and the bag label is a function of the labels of its instances. Typically, this function is the Boolean OR. The learner observes a sample of bags and the bag labels, but not the instance labels that determine the bag labels. The learner is then required to emit a classification rule for bags based on the sample. MIL has numerous applications, and many heuristic algorithms have been used successfully on this problem, each adapted to specific settings or applications. In this work we provide a unified theoretical analysis for MIL, which holds for any underlying hypothesis class, regardless of a specific application or problem domain. We show that the sample complexity of MIL is only poly-logarithmically dependent on the size of the bag, for any underlying hypothesis class. In addition, we introduce a new PAC-learning algorithm for MIL, which uses a regular supervised learning algorithm as an oracle. We prove that efficient PAC-learning for MIL can be generated from any efficient non-MIL supervised learning algorithm that handles one-sided error. The computational complexity of the resulting algorithm is only polynomially dependent on the bag size.
💡 Research Summary
The paper presents a unified theoretical framework for Multiple‑Instance Learning (MIL) that works for any underlying hypothesis class, extending beyond the classic Boolean‑OR bag labeling. The authors formalize MIL by introducing a known bag‑label function ψ: I^R → I (where I is the label set and R the allowed bag sizes) that maps instance‑level predictions to a bag‑level prediction. Given an instance hypothesis class H ⊆ I^X, the induced bag hypothesis class is H_ψ = {h_ψ(x̄) = ψ(h(x₁),…,h(x_|x̄|)) | h ∈ H}. The learning goal is to find a bag classifier with expected loss close to the optimal loss ℓ* (H, D) under an unknown distribution D over bags and labels.
The first major contribution is a sample‑complexity analysis that holds for arbitrary bag‑label functions ψ and for both binary and real‑valued hypothesis classes. Using covering numbers, VC‑dimension arguments, and Rademacher complexity, the authors prove that the number of labeled bags needed grows only logarithmically with the maximum bag size. This result is distribution‑free and does not rely on any independence assumption among instances inside a bag; it therefore applies to the more realistic setting where instances may be arbitrarily correlated. For margin‑based learning and for bag functions derived from p‑norms (including the average ψ₁ and the max ψ_∞), they obtain poly‑logarithmic bounds that interpolate between the two extremes via the p‑norm inequality.
The second major contribution is an efficient PAC‑learning algorithm for MIL that leverages any non‑MIL learning algorithm capable of handling one‑sided label noise. The algorithm proceeds as follows: (1) treat each bag as a collection of instances; (2) invoke the base learner A on the bag’s instances to obtain a hypothesis that never misclassifies positive instances (one‑sided error) while allowing a bounded error ε on negatives; (3) apply the known ψ to the predictions on the instances to compute a bag‑level prediction. Because ψ is known, this step is computationally trivial (e.g., for Boolean OR, a single positive instance suffices). The overall runtime is polynomial in the number of bags and in the maximum bag size, and the output hypothesis—though possibly “improper” (i.e., not belonging to H_ψ)—achieves expected loss at most ℓ* + ε with high probability. This construction shows that efficient PAC‑learning of MIL is possible for any hypothesis class, contrasting with earlier hardness results for specific classes such as Axis‑Parallel Rectangles (APRs), where learning would imply unlikely complexity collapses (RP = NP).
The paper also discusses two families of bag‑label functions. The first extends monotone Boolean functions to the real domain by replacing OR with max and AND with min, yielding a class M_n defined inductively. The second family consists of p‑norm based functions ψ_p that map bounded real instance predictions to a bag prediction via a normalized p‑norm (ψ₁ is the average, ψ_∞ is the max). The authors prove that for any ψ_p, the sample‑complexity bounds remain logarithmic in bag size, and the PAC algorithm works unchanged because ψ_p is efficiently computable.
In the related work section, the authors contrast their results with prior analyses that either assumed i.i.d. instances within bags (leading to simple reductions to standard learning) or demonstrated hardness for specific hypothesis classes under arbitrary bag distributions. They argue that their general approach subsumes these earlier results and provides a systematic method to transfer any non‑MIL learning guarantee (including computational efficiency) to the MIL setting, provided ψ is known and the base learner tolerates one‑sided errors.
Overall, the paper delivers a comprehensive theory that (i) bounds the sample complexity of MIL independently of bag size beyond a logarithmic factor, (ii) supplies a practical reduction from MIL to standard supervised learning via a one‑sided error oracle, and (iii) shows that this reduction works for a broad spectrum of bag‑label functions and hypothesis classes. The results bridge a gap between heuristic MIL algorithms and rigorous learning theory, offering both theoretical insight and a blueprint for constructing efficient MIL learners from existing supervised learning tools.
Comments & Academic Discussion
Loading comments...
Leave a Comment