Partial Feedback Online Learning
We study a new learning protocol, termed partial-feedback online learning, where each instance admits a set of acceptable labels, but the learner observes only one acceptable label per round. We highlight that, while classical version space is widely used for online learnability, it does not directly extend to this setting. We address this obstacle by introducing a collection version space, which maintains sets of hypotheses rather than individual hypotheses. Using this tool, we obtain a tight characterization of learnability in the set-realizable regime. In particular, we define the Partial-Feedback Littlestone dimension (PFLdim) and the Partial-Feedback Measure Shattering dimension (PMSdim), and show that they tightly characterize the minimax regret for deterministic and randomized learners, respectively. We further identify a nested inclusion condition under which deterministic and randomized learnability coincide, resolving an open question of Raman et al. (2024b). Finally, given a hypothesis space H, we show that beyond set realizability, the minimax regret can be linear even when |H|=2, highlighting a barrier beyond set realizability.
💡 Research Summary
The paper introduces a novel online learning protocol called partial‑feedback online learning (PFOL). In each round an instance xₜ arrives together with a hidden set of acceptable labels Sₜ ⊆ Y. The learner makes a prediction ˆyₜ (or a distribution ˆπₜ) and then observes only a single witness label y_visₜ ∈ Sₜ. Crucially, the learner never sees whether its own prediction belongs to Sₜ, nor does it see the full set Sₜ until the end of the game. This feedback model sits between the classic full‑information setting (where the true label is revealed) and bandit feedback (where only a binary correctness signal is given).
The authors first point out that the traditional version space—the set of hypotheses consistent with observed (x, y) pairs—fails in PFOL because a single witness label does not falsify any individual hypothesis. To overcome this, they define a collection version space ˜Vₜ ⊆ 𝒫(H), i.e., a set of subsets of hypotheses. Initially ˜V₀ is the power set of H; after each round they keep only those subsets F for which the observed witness belongs to the image F(xₜ) = {f(xₜ) : f ∈ F}. This “set‑level” consistency restores monotone shrinkage and enables tree‑based shattering arguments.
Based on this tool the paper proposes two new combinatorial dimensions:
-
Partial‑Feedback Littlestone dimension (PFLdim) – tailored for deterministic learners. It is defined via Y‑ary trees whose nodes are instances and whose edges are labeled by possible witness labels. A tree of depth d is shattered if for every root‑to‑leaf path there exists a non‑empty collection of hypotheses that can produce exactly the sequence of witnessed labels along the path. The maximal depth of a shattered tree is PFLdim. The authors prove that if PFLdim = k, any deterministic algorithm can achieve regret O(k·log T), and conversely any deterministic algorithm must incur Ω(k) regret in the worst case.
-
Partial‑Feedback Measure Shattering dimension (PMSdim) – designed for randomized learners. Here the tree edges are annotated with measurable subsets of Y, and a path is shattered if there exists a probability distribution over hypotheses that assigns positive mass to each edge’s label. The maximal depth yields PMSdim. They show that regret for optimal randomized strategies scales as Θ(√(PMSdim·T)), and this bound is tight.
The paper also identifies a nested‑inclusion property of the admissible label‑set family S(Y). When S(Y) satisfies that inclusion of label sets implies inclusion of their induced image sets on any instance, the two dimensions coincide (PFLdim = PMSdim). Consequently, deterministic and randomized learnability are equivalent under this condition, answering an open question from Raman et al. (2024b) about the necessity of a finite Helly number.
The authors focus primarily on the set‑realizable regime, where there exists a fixed subset F⋆ ⊆ H such that Sₜ = F⋆(xₜ) for all t. In this regime the collection version space shrinks toward F⋆, and the regret bounds derived from PFLdim and PMSdim are exact.
Beyond set‑realizability, they demonstrate a stark impossibility: even with only two hypotheses (|H| = 2) the minimax regret can be linear in T. By constructing an adversarial sequence that alternates which hypothesis is “correct” in a way that is indistinguishable from the learner’s limited feedback, they show that no algorithm—deterministic or randomized—can achieve sublinear regret. This highlights a fundamental barrier unique to PFOL, distinct from the agnostic or existence‑realizable settings studied in other online learning models.
Technical contributions include: (i) a coarse finite‑class learnability bound (regret ≤ Σ_{i=1}^{⌊n/2⌋} n_i) using a simple for‑loop strategy on the collection version space; (ii) tight dimension‑based upper and lower bounds for both deterministic and randomized learners; (iii) a proof that the nested‑inclusion condition suffices for deterministic/randomized equivalence; and (iv) a linear‑regret lower bound for non‑set‑realizable cases.
In summary, the paper provides a complete theoretical framework for online learning with only a single correct label observed per round. By introducing the collection version space and the two shattering dimensions, it precisely characterizes when sublinear regret is possible, when deterministic and randomized strategies differ, and where fundamental impossibility arises. These results open avenues for future work on noise‑sensitive complexity measures and algorithmic designs that can operate under the extremely limited feedback inherent in many real‑world annotation pipelines.
Comments & Academic Discussion
Loading comments...
Leave a Comment