Online Structured Prediction via Coactive Learning

We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. At each step, the system (e.g. search engine) receives a context (e.g. query) and predicts an object (e.g. ranking). The user responds by correcting the system if necessary, providing a slightly improved – but not necessarily optimal – object as feedback. We argue that such feedback can often be inferred from observable user behavior, for example, from clicks in web-search. Evaluating predictions by their cardinal utility to the user, we propose efficient learning algorithms that have ${\cal O}(\frac{1}{\sqrt{T}})$ average regret, even though the learning algorithm never observes cardinal utility values as in conventional online learning. We demonstrate the applicability of our model and learning algorithms on a movie recommendation task, as well as ranking for web-search.

💡 Research Summary

The paper introduces Coactive Learning, a novel interactive learning framework designed for scenarios where a learning system and a human user share the common objective of maximizing the user’s utility, yet the system never observes explicit utility values. In each round, the system receives a context (e.g., a search query) and produces a structured output (e.g., a ranking or recommendation list). The user then provides feedback by supplying a slightly better object—one that improves upon the system’s prediction but is not necessarily optimal. This type of feedback can be inferred from observable user behavior such as clicks, dwell time, or purchase actions, making the model highly relevant for real‑world applications where direct utility measurement is infeasible.

Problem formulation: Let (X) be the context space, (Y) the structured output space, and (U: X \times Y \rightarrow \mathbb{R}) the unknown utility function. At round (t) the learner predicts (\hat{y}_t\in Y). The user returns a correction (y_t) satisfying a margin condition (U(x_t, y_t) \ge U(x_t, \hat{y}_t) + \gamma) for some (\gamma>0). The learner never sees (U) directly; it only observes the pair ((\hat{y}_t, y_t)).

Algorithms: Two algorithms are proposed. The first, Coactive Perceptron (CP), updates a linear weight vector (w) by adding the feature‑difference (\phi(x_t, y_t) - \phi(x_t, \hat{y}_t)) scaled by a learning rate (\eta_t). The second, COCOA (Confidence‑Weighted Coactive Algorithm), augments CP with a confidence matrix, performing a KL‑divergence‑minimizing update that adapts the step size according to uncertainty in the weight estimate. Both algorithms operate in an online fashion and require only the feature representation (\phi) of context‑output pairs.

Theoretical results: Assuming bounded feature norms ((|\phi|\le B)) and the (\gamma)-margin condition, the authors prove an average regret bound of \